{"title":"Classquake: Measuring Students' Attentiveness in the Classroom","authors":"Kai Michael Hover, M. Muhlhauser","doi":"10.1109/ism.2015.24","DOIUrl":"https://doi.org/10.1109/ism.2015.24","url":null,"abstract":"","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134125097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.
{"title":"A Joint Asymmetric Graphics Rendering and Video Encoding Approach for Optimizing Cloud Mobile 3D Display Gaming User Experience","authors":"Yao Liu, Yao Liu, S. Dey","doi":"10.1109/ISM.2015.27","DOIUrl":"https://doi.org/10.1109/ISM.2015.27","url":null,"abstract":"With the development and deployment of ubiquitous wireless network together with the growing popularity of mobile auto-stereoscopic 3D displays, more and more applications have been developed to enable rich 3D mobile multimedia experiences, including 3D display gaming. Simultaneously, with the emergence of cloud computing, more mobile applications are being developed to take advantage of the elastic cloud resources. In this paper, we explore the possibility of developing Cloud Mobile 3D Display Gaming, where the 3D video rendering and encoding are performed on cloud servers, with the resulting 3D video streamed to mobile devices with 3D displays through wireless network. However, with the significantly higher bitrate requirement for 3D videos, ensuring user experience may be a challenge considering the bandwidth constraints of mobile networks. In order to address this challenge, different techniques have been proposed including asymmetric graphics rendering and asymmetric video encoding. In this paper, for the first time, we propose a joint asymmetric graphics rendering and video encoding approach, where both the encoding quality and rendering richness of left view and right view are asymmetric, to enhance the user experience of the cloud mobile 3D display gaming system. Specifically, we first conduct extensive user studies to develop a user experience model that takes into account both video encoding impairment and graphics rendering impairment. We also develop a model to relate the bitrate of the resulting video with the video encoding settings and graphics rendering settings. Finally we propose an optimization algorithm that can automatically choose the video encoding settings and graphics rendering settings for left view and right view to ensure the best user experience given the network conditions. Experiments conducted using real 4G-LTE network profiles on commercial cloud service demonstrate the improvement in user experience when the proposed optimization algorithm is applied.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124963910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, high-performance face recognition has attracted research attention in real-world scenarios. Thanks to the advances in sensor technology, face recognition system equipped with multiple sensors has been widely researched. Among them, face recognition system with near-infrared imagery has been one important research topic. In this paper, complementary effect resided in face images captured by nearinfrared and visible rays is exploited by combining two distinct spectral images (i.e., face images captured by near-infrared and visible rays). We propose a new texture feature (i.e., multispectral texture feature) extraction method with synthesized face images to achieve high-performance face recognition with illumination-invariant property. The experimental results show that the proposed method enhances the discriminative power of features thanks the complementary effect.
{"title":"Multispectral Texture Features from Visible and Near-Infrared Synthetic Face Images for Face Recognition","authors":"Hyungil Kim, Seung-ho Lee, Yong Man Ro","doi":"10.1109/ISM.2015.95","DOIUrl":"https://doi.org/10.1109/ISM.2015.95","url":null,"abstract":"Recently, high-performance face recognition has attracted research attention in real-world scenarios. Thanks to the advances in sensor technology, face recognition system equipped with multiple sensors has been widely researched. Among them, face recognition system with near-infrared imagery has been one important research topic. In this paper, complementary effect resided in face images captured by nearinfrared and visible rays is exploited by combining two distinct spectral images (i.e., face images captured by near-infrared and visible rays). We propose a new texture feature (i.e., multispectral texture feature) extraction method with synthesized face images to achieve high-performance face recognition with illumination-invariant property. The experimental results show that the proposed method enhances the discriminative power of features thanks the complementary effect.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126174905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work describes the design and subjective performance of Foveated High Efficiency Video Coding (FHEVC). Even though foveation has been widely used for various forms of compression since the early 1990s, we believe its use to improve HEVC is new. We consider the application of, possibly moving, foveated compression in this work and evaluate scenarios where it can be used to improve perceptual quality of videos under constrained transmission resources, e.g., bandwidth. A new method to reduce artifacts during remapping is also proposed. The preliminary implementation considers a single fovea only. Experiments summarizing user evaluations are presented to validate our implementation.
本文描述了注视点高效视频编码(Foveated High Efficiency Video Coding, FHEVC)的设计和主观性能。尽管自20世纪90年代初以来,注视点已广泛用于各种形式的压缩,但我们相信它用于改善HEVC是新的。我们在这项工作中考虑了可能移动的注视点压缩的应用,并评估了在受限传输资源(例如带宽)下可用于提高视频感知质量的场景。提出了一种减少重映射过程中伪影的新方法。初步实现只考虑单个中央凹。实验总结了用户的评价,以验证我们的实现。
{"title":"Foveated High Efficiency Video Coding for Low Bit Rate Transmission","authors":"I. Cheng, Masha Mohammadkhani, A. Basu, F. Dufaux","doi":"10.1109/ISM.2015.37","DOIUrl":"https://doi.org/10.1109/ISM.2015.37","url":null,"abstract":"This work describes the design and subjective performance of Foveated High Efficiency Video Coding (FHEVC). Even though foveation has been widely used for various forms of compression since the early 1990s, we believe its use to improve HEVC is new. We consider the application of, possibly moving, foveated compression in this work and evaluate scenarios where it can be used to improve perceptual quality of videos under constrained transmission resources, e.g., bandwidth. A new method to reduce artifacts during remapping is also proposed. The preliminary implementation considers a single fovea only. Experiments summarizing user evaluations are presented to validate our implementation.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123811358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we investigate a novel Multipoint Video Conferencing (MVC) architecture potentially suitable for Peer-to-Peer (P2P) platform, such as Gnutella. In particular, we present an election protocol (extension to Gnutella) where the Multipoint Control Unit (MCU) of the MVC is dynamically migrated among peers when new peer joins or leaves. Simulation result shows that this improves overall conferencing performance compared to the system with static MCU by minimizing total traffic, individual node hotness, and video composition delay.
{"title":"Dynamic MCU Placement for Video Conferencing on Peer-to-Peer Network","authors":"Md. Amjad Hossain, J. Khan","doi":"10.1109/ISM.2015.125","DOIUrl":"https://doi.org/10.1109/ISM.2015.125","url":null,"abstract":"In this paper, we investigate a novel Multipoint Video Conferencing (MVC) architecture potentially suitable for Peer-to-Peer (P2P) platform, such as Gnutella. In particular, we present an election protocol (extension to Gnutella) where the Multipoint Control Unit (MCU) of the MVC is dynamically migrated among peers when new peer joins or leaves. Simulation result shows that this improves overall conferencing performance compared to the system with static MCU by minimizing total traffic, individual node hotness, and video composition delay.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116847974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The past two decades have seen a shift in the multimedia consumption behaviours from that of collectivism and passivity, to individualism and activity. This paper introduces the architectural design, implementation and user evaluation of a second screen application, which is designed to supersede the traditional user control interface for primary screen interaction. We describe how NSMobile, our second screen application, can be used as a pervasive multimedia platform by integrating user experiences on both the second screen and primary screen. The quantitative and qualitative evaluation of user interactions with interactive TV content also contributes to the future design of second screen applications.
{"title":"Improving Interactive TV Experience Using Second Screen Mobile Applications","authors":"Mu Mu, W. Knowles, Yusuf Sani, A. Mauthe, N. Race","doi":"10.1109/ISM.2015.19","DOIUrl":"https://doi.org/10.1109/ISM.2015.19","url":null,"abstract":"The past two decades have seen a shift in the multimedia consumption behaviours from that of collectivism and passivity, to individualism and activity. This paper introduces the architectural design, implementation and user evaluation of a second screen application, which is designed to supersede the traditional user control interface for primary screen interaction. We describe how NSMobile, our second screen application, can be used as a pervasive multimedia platform by integrating user experiences on both the second screen and primary screen. The quantitative and qualitative evaluation of user interactions with interactive TV content also contributes to the future design of second screen applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130276847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilo Arévalo, M. GerardoM.Sarria, M. Mora, Carlos A. Arce-Lopera
A well-known musical genre and part of Latin-American cultural identity is Salsa. To be able to perform a scientific analysis of this genre, the first step to take is to analyze the structure of Salsa songs. Furthermore, the most representative part of Salsa is the chorus. In this paper we detail the design and implementation of an algorithm developed for getting the chorus of any Salsa song.
{"title":"Towards an Efficient Algorithm to Get the Chorus of a Salsa Song","authors":"Camilo Arévalo, M. GerardoM.Sarria, M. Mora, Carlos A. Arce-Lopera","doi":"10.1109/ISM.2015.42","DOIUrl":"https://doi.org/10.1109/ISM.2015.42","url":null,"abstract":"A well-known musical genre and part of Latin-American cultural identity is Salsa. To be able to perform a scientific analysis of this genre, the first step to take is to analyze the structure of Salsa songs. Furthermore, the most representative part of Salsa is the chorus. In this paper we detail the design and implementation of an algorithm developed for getting the chorus of any Salsa song.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129273833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultra high-definition television (UHDTV) video contain many similar objects in a single-frame because it has high self-similarity caused by its high resolution. In addition, typical UHDTV cameras have one-CMOS sensor with a Bayer or other color-sampling pattern. A super-resolution method using single-frame registration of an original image and its multi-scale components is therefore proposed. Furthermore, this registration performs similarly for this original image and multi-scale components in past and future images of this original image. Accuracy of the registration is enhanced by compensating the registration results in consideration of color-sampling patterns of UHDTV cameras. Experiments show that the proposed method provides an objectively better PSNR measurement and a subjectively better appearance in comparison with the conventional and state-of-the-art super-resolution methods.
{"title":"A Super-Resolution Method Using Spatio-Temporal Registration of Multi-Scale Components in Consideration of Color-Sampling Patterns of UHDTV Cameras","authors":"Y. Matsuo, S. Sakaida","doi":"10.1109/ISM.2015.57","DOIUrl":"https://doi.org/10.1109/ISM.2015.57","url":null,"abstract":"Ultra high-definition television (UHDTV) video contain many similar objects in a single-frame because it has high self-similarity caused by its high resolution. In addition, typical UHDTV cameras have one-CMOS sensor with a Bayer or other color-sampling pattern. A super-resolution method using single-frame registration of an original image and its multi-scale components is therefore proposed. Furthermore, this registration performs similarly for this original image and multi-scale components in past and future images of this original image. Accuracy of the registration is enhanced by compensating the registration results in consideration of color-sampling patterns of UHDTV cameras. Experiments show that the proposed method provides an objectively better PSNR measurement and a subjectively better appearance in comparison with the conventional and state-of-the-art super-resolution methods.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growth of heterogeneous social media networks and the widespread use of camera-equipped handheld devices, interactive video broadcasting services are emerging on the Internet. When a media server combines and broadcasts live-streaming video contents received from heterogeneous camera equipped devices filming a common scene from different angles, the time-based alignment of the audio and video streams is required. Although many techniques and methods for video stream synchronization have been in use or proposed, these solutions are not suitable for a non-centralized multi-camera system consisting of for example heterogeneous camera-equipped smart phones. This paper proposes a novel approach by harnessing the capabilities of Visible Light Communication (VLC) to provide a robust and efficient way to synchronize video streams. This paper presents the design and implementation of a VLC-based video synchronization prototype. The synchronization of different video streams is provided by the means of VLC through Light Emitting Diode (LED) lights and digital phone cameras. This is achieved by embedding the necessary information as light patterns in the video content which can later be extracted by processing the video streams. The main benefit of our approach is the ability to use off-the-shelf cameras as it does not require any modification of software or hardware components in the camera devices. Moreover, the means of VLC can be exploited to carry other types of information such as position so that the receiver of the video stream can have a notion of the location in which the video was recorded.
{"title":"Frame Synchronization of Live Video Streams Using Visible Light Communication","authors":"Maziar Mehrabi, S. Lafond, Le Wang","doi":"10.1109/ISM.2015.26","DOIUrl":"https://doi.org/10.1109/ISM.2015.26","url":null,"abstract":"With the growth of heterogeneous social media networks and the widespread use of camera-equipped handheld devices, interactive video broadcasting services are emerging on the Internet. When a media server combines and broadcasts live-streaming video contents received from heterogeneous camera equipped devices filming a common scene from different angles, the time-based alignment of the audio and video streams is required. Although many techniques and methods for video stream synchronization have been in use or proposed, these solutions are not suitable for a non-centralized multi-camera system consisting of for example heterogeneous camera-equipped smart phones. This paper proposes a novel approach by harnessing the capabilities of Visible Light Communication (VLC) to provide a robust and efficient way to synchronize video streams. This paper presents the design and implementation of a VLC-based video synchronization prototype. The synchronization of different video streams is provided by the means of VLC through Light Emitting Diode (LED) lights and digital phone cameras. This is achieved by embedding the necessary information as light patterns in the video content which can later be extracted by processing the video streams. The main benefit of our approach is the ability to use off-the-shelf cameras as it does not require any modification of software or hardware components in the camera devices. Moreover, the means of VLC can be exploited to carry other types of information such as position so that the receiver of the video stream can have a notion of the location in which the video was recorded.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116235884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Branco, Nuno Correia, A. Rodrigues, João Gouveia, Rui Nóbrega
Image matching algorithms are used in image search, classification and retrieval but are also useful to show how urban structures evolve over time. Images have the power to illustrate and evoke past events and can be used to show the evolution of structures such as buildings and other elements present in the urban landscape. The paper describes a process and a tool to provide a chronological journey through time, given a set of photographs from different time periods. The developed tool provides the ability to generate visualizations of a geographic location, given a set of related images, taken at different periods in time. It automatically processes comparisons of images and establishes relationships between them. It also offers a semi-automated method to define relationships between parts of images.
{"title":"Temporal and Spatial Evolution through Images","authors":"F. Branco, Nuno Correia, A. Rodrigues, João Gouveia, Rui Nóbrega","doi":"10.1109/ISM.2015.105","DOIUrl":"https://doi.org/10.1109/ISM.2015.105","url":null,"abstract":"Image matching algorithms are used in image search, classification and retrieval but are also useful to show how urban structures evolve over time. Images have the power to illustrate and evoke past events and can be used to show the evolution of structures such as buildings and other elements present in the urban landscape. The paper describes a process and a tool to provide a chronological journey through time, given a set of photographs from different time periods. The developed tool provides the ability to generate visualizations of a geographic location, given a set of related images, taken at different periods in time. It automatically processes comparisons of images and establishes relationships between them. It also offers a semi-automated method to define relationships between parts of images.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}