This paper proposes a high capacity audio watermarking algorithm in the logarithm domain based on the absolute threshold of hearing (ATH) of the human auditory system (HAS) which makes this scheme a novel technique. The key idea is to divide the selected frequency band into short frames and quantize the samples based on the HAS. Apart from remarkable capacity, transparency and robustness, this scheme provides three parameters (frequency band, scale factor, and frame size) which facilitate the regulation of the watermarking properties. The experimental results show that the method has a high capacity (800 to 7000 bits per second), without significant perceptual distortion (ODG is greater than - 1) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3).
{"title":"High Capacity Logarithmic Audio Watermarking Based on the Human Auditory System","authors":"Mehdi Fallahpour, D. Megías","doi":"10.1109/ISM.2012.13","DOIUrl":"https://doi.org/10.1109/ISM.2012.13","url":null,"abstract":"This paper proposes a high capacity audio watermarking algorithm in the logarithm domain based on the absolute threshold of hearing (ATH) of the human auditory system (HAS) which makes this scheme a novel technique. The key idea is to divide the selected frequency band into short frames and quantize the samples based on the HAS. Apart from remarkable capacity, transparency and robustness, this scheme provides three parameters (frequency band, scale factor, and frame size) which facilitate the regulation of the watermarking properties. The experimental results show that the method has a high capacity (800 to 7000 bits per second), without significant perceptual distortion (ODG is greater than - 1) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3).","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124151439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate and compact representation of music signals is a key component of large-scale content-based music applications such as music content management and near duplicate audio detection. This problem is not well solved yet despite many research efforts in this field. In this paper, we suggest mid-level summarization of music signals based on chord progressions. More specially, in our proposed algorithm, chord progressions are recognized from music signals based on a supervised learning model, and recognition accuracy is improved by locally probing n-best candidates. By investigating the properties of chord progressions, we further calculate a histogram from the probed chord progressions as a summary of the music signal. We show that the chord progression-based summarization is a powerful feature descriptor for representing harmonic progressions and tonal structures of music signals. The proposed algorithm is evaluated with content-based music retrieval as a typical application. The experimental results on a dataset with more than 70,000 songs confirm that our algorithm can effectively improve summarization accuracy of musical audio contents and retrieval performance, and enhance music retrieval applications on large-scale audio databases.
{"title":"Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval","authors":"Yi Yu, Roger Zimmermann, Ye Wang, Vincent Oria","doi":"10.1109/ISM.2012.10","DOIUrl":"https://doi.org/10.1109/ISM.2012.10","url":null,"abstract":"Accurate and compact representation of music signals is a key component of large-scale content-based music applications such as music content management and near duplicate audio detection. This problem is not well solved yet despite many research efforts in this field. In this paper, we suggest mid-level summarization of music signals based on chord progressions. More specially, in our proposed algorithm, chord progressions are recognized from music signals based on a supervised learning model, and recognition accuracy is improved by locally probing n-best candidates. By investigating the properties of chord progressions, we further calculate a histogram from the probed chord progressions as a summary of the music signal. We show that the chord progression-based summarization is a powerful feature descriptor for representing harmonic progressions and tonal structures of music signals. The proposed algorithm is evaluated with content-based music retrieval as a typical application. The experimental results on a dataset with more than 70,000 songs confirm that our algorithm can effectively improve summarization accuracy of musical audio contents and retrieval performance, and enhance music retrieval applications on large-scale audio databases.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an H.264-compatible temporal sub band coding scheme for static background scenes of soccer video. We utilize orthonormal wavelet transforms to decompose a group of successive frames into temporal sub bands. By exploiting the property of energy conservation of orthonormal wavelet transforms, we construct a rate distortion model for optimal bit rate allocation among different sub bands. To take advantage of the high efficiency video codec H.264/AVC, we encode each sub band with H.264/AVC Fidelity Range Extension (FRExt) intra-coding by assigning optimal bit rates. The experimental results show that our proposed coding scheme outperforms conventional video coding with H.264/AVC for both subjective and objective evaluations.
{"title":"H.264-Compatible Coding of Background Soccer Video Using Temporal Subbands","authors":"Xiaohua Lu, Haopeng Li, M. Flierl","doi":"10.1109/ISM.2012.34","DOIUrl":"https://doi.org/10.1109/ISM.2012.34","url":null,"abstract":"This paper presents an H.264-compatible temporal sub band coding scheme for static background scenes of soccer video. We utilize orthonormal wavelet transforms to decompose a group of successive frames into temporal sub bands. By exploiting the property of energy conservation of orthonormal wavelet transforms, we construct a rate distortion model for optimal bit rate allocation among different sub bands. To take advantage of the high efficiency video codec H.264/AVC, we encode each sub band with H.264/AVC Fidelity Range Extension (FRExt) intra-coding by assigning optimal bit rates. The experimental results show that our proposed coding scheme outperforms conventional video coding with H.264/AVC for both subjective and objective evaluations.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130832342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the current state of our research project which aims to give users a simple, easy to use, and computationally light way of creating mashups of lecture content within the Opencast Matter horn lecture capture system. The system modifies the playback components of Matter horn to deliver thin and light video clipping functionality without requiring installation of any additional software. We plan to make use of the extensive logging framework built into Matter horn to examine the effects of this tool on learner engagement.
{"title":"Thin and Light Video Editing Extensions for Education with Opencast Matterhorn","authors":"Greg Logan, J. Greer, G. McCalla","doi":"10.1109/ISM.2012.95","DOIUrl":"https://doi.org/10.1109/ISM.2012.95","url":null,"abstract":"This paper presents the current state of our research project which aims to give users a simple, easy to use, and computationally light way of creating mashups of lecture content within the Opencast Matter horn lecture capture system. The system modifies the playback components of Matter horn to deliver thin and light video clipping functionality without requiring installation of any additional software. We plan to make use of the extensive logging framework built into Matter horn to examine the effects of this tool on learner engagement.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eliya Buyukkaya, Shakeel Ahmad, Muneeb Dawood, Jiayi Liu, Fen Zhou, R. Hamzaoui, G. Simon
We propose a peer-to-peer system for streaming user-generated live video. Peers are arranged in levels so that video is delivered at about the same time to all peers in the same level, and peers in a higher level watch the video before those in a lower level. We encode the video bit stream with rate less codes and use trees to transmit the encoded symbols. Trees are constructed to minimize the transmission rate for the source while maximizing the number of served peers and guaranteeing on-time delivery and reliability at the peers. We formulate this objective as a height bounded spanning forest problem with nodal capacity constraint and compute a solution using a heuristic polynomial-time algorithm. We conduct ns-2 simulations to study the trade-off between used bandwidth and video quality for various packet loss rates and link latencies.
{"title":"Level-Based Peer-to-Peer Live Streaming with Rateless Codes","authors":"Eliya Buyukkaya, Shakeel Ahmad, Muneeb Dawood, Jiayi Liu, Fen Zhou, R. Hamzaoui, G. Simon","doi":"10.1109/ISM.2012.54","DOIUrl":"https://doi.org/10.1109/ISM.2012.54","url":null,"abstract":"We propose a peer-to-peer system for streaming user-generated live video. Peers are arranged in levels so that video is delivered at about the same time to all peers in the same level, and peers in a higher level watch the video before those in a lower level. We encode the video bit stream with rate less codes and use trees to transmit the encoded symbols. Trees are constructed to minimize the transmission rate for the source while maximizing the number of served peers and guaranteeing on-time delivery and reliability at the peers. We formulate this objective as a height bounded spanning forest problem with nodal capacity constraint and compute a solution using a heuristic polynomial-time algorithm. We conduct ns-2 simulations to study the trade-off between used bandwidth and video quality for various packet loss rates and link latencies.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131660112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A SOcial LIve Stream, SOLIS, is a live stream produced by a device whose owner is sharing the stream with her friends, granting each friend to perform time shifted viewing for a pre-specified duration. The system buffers this chase data to facilitate its browsing and display. In the presence of many Solis, memory may overflow and prevent display of some chase data. This paper presents a novel data-aware admission control, DA-AdmCtrl, technique that summarizes chase data pro-actively to maximize the number of admissible SOLISs with no memory overflow. It is designed for use with multi-core CPUs and maximizes utility of data whenever the user's level of satisfaction (utility) with different data formats is available.
{"title":"A Data Aware Admission Control Technique for Social Live Streams (SOLISs)","authors":"Sumita Barahmand, Shahram Ghandeharizadeh","doi":"10.1109/ISM.2012.68","DOIUrl":"https://doi.org/10.1109/ISM.2012.68","url":null,"abstract":"A SOcial LIve Stream, SOLIS, is a live stream produced by a device whose owner is sharing the stream with her friends, granting each friend to perform time shifted viewing for a pre-specified duration. The system buffers this chase data to facilitate its browsing and display. In the presence of many Solis, memory may overflow and prevent display of some chase data. This paper presents a novel data-aware admission control, DA-AdmCtrl, technique that summarizes chase data pro-actively to maximize the number of admissible SOLISs with no memory overflow. It is designed for use with multi-core CPUs and maximizes utility of data whenever the user's level of satisfaction (utility) with different data formats is available.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiko Uemura, J. Katto, Kyota Higa, Masumi Ishikawa, T. Nomura
This paper presents a music part detection method incorporating chroma vector analysis for use with music TV programs. Results show that envelopes of chroma components of music signals tend to have horizontal (i.e. temporal) correlation in time-frequency representation because music signals have a periodic chord sequences. Based on this fact, we analyze time series of chroma components and attempt to segment music parts in music TV programs from other parts. Experimental results show an F-measure of 0.78, which is better than that obtained using the previous method.
{"title":"Music Part Segmentation in Music TV Programs Based on Chroma Vector Analysis","authors":"Aiko Uemura, J. Katto, Kyota Higa, Masumi Ishikawa, T. Nomura","doi":"10.1109/ISM.2012.14","DOIUrl":"https://doi.org/10.1109/ISM.2012.14","url":null,"abstract":"This paper presents a music part detection method incorporating chroma vector analysis for use with music TV programs. Results show that envelopes of chroma components of music signals tend to have horizontal (i.e. temporal) correlation in time-frequency representation because music signals have a periodic chord sequences. Based on this fact, we analyze time series of chroma components and attempt to segment music parts in music TV programs from other parts. Experimental results show an F-measure of 0.78, which is better than that obtained using the previous method.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124021944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Ju Jung, Seong-il Lee, Hosik Sohn, Yong Man Ro
Visual discomfort prediction is of importance for image safety issue in stereoscopic displays. This paper proposes automatic visualization of the perceived discomfort for stereoscopic video contents. The proposed method makes effective use of saliency-based measures for visual importance analysis in video scenes. Based on the analysis of visual importance, we quantify and visualize the visual discomfort induced by disparity and motion characteristics of stereoscopic video contents. The proposed method outputs visual importance-based comfort maps that allow users to monitor which regions in each video frame are perceptually significant and problematic with respect to visual discomfort. Experimental results have demonstrated the effectiveness of the proposed method by subjective assessments using various types of stereoscopic videos with diverse disparity and motion characteristics.
{"title":"Visualizing the Perceived Discomfort of Stereoscopic Video","authors":"Yong Ju Jung, Seong-il Lee, Hosik Sohn, Yong Man Ro","doi":"10.1109/ISM.2012.41","DOIUrl":"https://doi.org/10.1109/ISM.2012.41","url":null,"abstract":"Visual discomfort prediction is of importance for image safety issue in stereoscopic displays. This paper proposes automatic visualization of the perceived discomfort for stereoscopic video contents. The proposed method makes effective use of saliency-based measures for visual importance analysis in video scenes. Based on the analysis of visual importance, we quantify and visualize the visual discomfort induced by disparity and motion characteristics of stereoscopic video contents. The proposed method outputs visual importance-based comfort maps that allow users to monitor which regions in each video frame are perceptually significant and problematic with respect to visual discomfort. Experimental results have demonstrated the effectiveness of the proposed method by subjective assessments using various types of stereoscopic videos with diverse disparity and motion characteristics.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124748427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hua Wang, D. Joshi, Jiebo Luo, Heng Huang, Minwoo Park
In recent years, several methods have been proposed to exploit image context (such as location) that provides valuable cues complementary to the image content, i.e., image annotations and geotags have been shown to assist the prediction of each other. To exploit the useful interrelatedness between these two heterogeneous prediction tasks, we propose a new correlation guided structured sparse multi-task learning method. We utilize a joint classification and regression model to identify annotation-informative and geotag-relevant image features. We also introduce the tree-structured sparsity regularizations into multi-task learning to integrate the label correlations in multi-label image annotation. Finally we derive an efficient algorithm to optimize our non-smooth objective function. We demonstrate the performance of our method on three real-world geotagged multi-label image data sets for both semantic annotation and geotag prediction.
{"title":"Simultaneous Image Annotation and Geo-Tag Prediction via Correlation Guided Multi-task Learning","authors":"Hua Wang, D. Joshi, Jiebo Luo, Heng Huang, Minwoo Park","doi":"10.1109/ISM.2012.21","DOIUrl":"https://doi.org/10.1109/ISM.2012.21","url":null,"abstract":"In recent years, several methods have been proposed to exploit image context (such as location) that provides valuable cues complementary to the image content, i.e., image annotations and geotags have been shown to assist the prediction of each other. To exploit the useful interrelatedness between these two heterogeneous prediction tasks, we propose a new correlation guided structured sparse multi-task learning method. We utilize a joint classification and regression model to identify annotation-informative and geotag-relevant image features. We also introduce the tree-structured sparsity regularizations into multi-task learning to integrate the label correlations in multi-label image annotation. Finally we derive an efficient algorithm to optimize our non-smooth objective function. We demonstrate the performance of our method on three real-world geotagged multi-label image data sets for both semantic annotation and geotag prediction.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125062170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Facebook has rapidly become for many the dominant means for sharing images, and the number of shared images accessible to any given Facebook user is easily in the tens of thousands. The sheer volume of pictures relegates most to obscurity, yet some of those pictures would be of great interest -- if a person could only find them. This research explores ways to harness latent semantic information associated with pictures and interpersonal relationships to enable a person to browse for potentially interesting and germane images shared by people in their social network. The possibilities for semantic analysis are endless, this work illustrates two possible approaches while also highlighting future potential applications of semantic understanding.
{"title":"Exploring Photos in Facebook","authors":"Mark D. Wood, Minwoo Park","doi":"10.1109/ISM.2012.25","DOIUrl":"https://doi.org/10.1109/ISM.2012.25","url":null,"abstract":"Facebook has rapidly become for many the dominant means for sharing images, and the number of shared images accessible to any given Facebook user is easily in the tens of thousands. The sheer volume of pictures relegates most to obscurity, yet some of those pictures would be of great interest -- if a person could only find them. This research explores ways to harness latent semantic information associated with pictures and interpersonal relationships to enable a person to browse for potentially interesting and germane images shared by people in their social network. The possibilities for semantic analysis are endless, this work illustrates two possible approaches while also highlighting future potential applications of semantic understanding.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128337677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}