首页 > 最新文献

2012 IEEE International Symposium on Multimedia最新文献

英文 中文
High Capacity Logarithmic Audio Watermarking Based on the Human Auditory System 基于人听觉系统的高容量对数音频水印
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.13
Mehdi Fallahpour, D. Megías
This paper proposes a high capacity audio watermarking algorithm in the logarithm domain based on the absolute threshold of hearing (ATH) of the human auditory system (HAS) which makes this scheme a novel technique. The key idea is to divide the selected frequency band into short frames and quantize the samples based on the HAS. Apart from remarkable capacity, transparency and robustness, this scheme provides three parameters (frequency band, scale factor, and frame size) which facilitate the regulation of the watermarking properties. The experimental results show that the method has a high capacity (800 to 7000 bits per second), without significant perceptual distortion (ODG is greater than - 1) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3).
本文提出了一种基于听觉绝对阈值的对数域高容量音频水印算法,使该算法成为一种新技术。其关键思想是将所选频带划分为短帧,并基于HAS对采样进行量化。该方案除了具有显著的容量、透明度和鲁棒性外,还提供了三个参数(频带、比例因子和帧大小),便于对水印特性进行调节。实验结果表明,该方法具有高容量(800 ~ 7000比特/秒),没有明显的感知失真(ODG大于- 1),并且对常见的音频信号处理(如添加噪声、滤波和MPEG压缩(MP3))具有鲁棒性。
{"title":"High Capacity Logarithmic Audio Watermarking Based on the Human Auditory System","authors":"Mehdi Fallahpour, D. Megías","doi":"10.1109/ISM.2012.13","DOIUrl":"https://doi.org/10.1109/ISM.2012.13","url":null,"abstract":"This paper proposes a high capacity audio watermarking algorithm in the logarithm domain based on the absolute threshold of hearing (ATH) of the human auditory system (HAS) which makes this scheme a novel technique. The key idea is to divide the selected frequency band into short frames and quantize the samples based on the HAS. Apart from remarkable capacity, transparency and robustness, this scheme provides three parameters (frequency band, scale factor, and frame size) which facilitate the regulation of the watermarking properties. The experimental results show that the method has a high capacity (800 to 7000 bits per second), without significant perceptual distortion (ODG is greater than - 1) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3).","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124151439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval 和弦进行的识别与总结及其在音乐信息检索中的应用
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.10
Yi Yu, Roger Zimmermann, Ye Wang, Vincent Oria
Accurate and compact representation of music signals is a key component of large-scale content-based music applications such as music content management and near duplicate audio detection. This problem is not well solved yet despite many research efforts in this field. In this paper, we suggest mid-level summarization of music signals based on chord progressions. More specially, in our proposed algorithm, chord progressions are recognized from music signals based on a supervised learning model, and recognition accuracy is improved by locally probing n-best candidates. By investigating the properties of chord progressions, we further calculate a histogram from the probed chord progressions as a summary of the music signal. We show that the chord progression-based summarization is a powerful feature descriptor for representing harmonic progressions and tonal structures of music signals. The proposed algorithm is evaluated with content-based music retrieval as a typical application. The experimental results on a dataset with more than 70,000 songs confirm that our algorithm can effectively improve summarization accuracy of musical audio contents and retrieval performance, and enhance music retrieval applications on large-scale audio databases.
准确和紧凑的音乐信号表示是大规模基于内容的音乐应用程序的关键组成部分,如音乐内容管理和近重复音频检测。尽管在这一领域进行了许多研究,但这一问题尚未得到很好的解决。在本文中,我们提出了基于和弦进行的音乐信号的中级总结。更具体地说,在我们提出的算法中,基于监督学习模型从音乐信号中识别和弦进行,并通过局部探测n个最佳候选者来提高识别精度。通过研究和弦进行的性质,我们进一步从探测到的和弦进行中计算直方图,作为音乐信号的总结。我们证明了基于和弦进行的摘要是一个强大的特征描述符来表示音乐信号的和声进行和调性结构。以基于内容的音乐检索为典型应用,对该算法进行了评价。在超过7万首歌曲的数据集上的实验结果证实了我们的算法可以有效地提高音乐音频内容的总结精度和检索性能,增强在大型音频数据库上的音乐检索应用。
{"title":"Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval","authors":"Yi Yu, Roger Zimmermann, Ye Wang, Vincent Oria","doi":"10.1109/ISM.2012.10","DOIUrl":"https://doi.org/10.1109/ISM.2012.10","url":null,"abstract":"Accurate and compact representation of music signals is a key component of large-scale content-based music applications such as music content management and near duplicate audio detection. This problem is not well solved yet despite many research efforts in this field. In this paper, we suggest mid-level summarization of music signals based on chord progressions. More specially, in our proposed algorithm, chord progressions are recognized from music signals based on a supervised learning model, and recognition accuracy is improved by locally probing n-best candidates. By investigating the properties of chord progressions, we further calculate a histogram from the probed chord progressions as a summary of the music signal. We show that the chord progression-based summarization is a powerful feature descriptor for representing harmonic progressions and tonal structures of music signals. The proposed algorithm is evaluated with content-based music retrieval as a typical application. The experimental results on a dataset with more than 70,000 songs confirm that our algorithm can effectively improve summarization accuracy of musical audio contents and retrieval performance, and enhance music retrieval applications on large-scale audio databases.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
H.264-Compatible Coding of Background Soccer Video Using Temporal Subbands 基于时间子带的背景足球视频h .264兼容编码
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.34
Xiaohua Lu, Haopeng Li, M. Flierl
This paper presents an H.264-compatible temporal sub band coding scheme for static background scenes of soccer video. We utilize orthonormal wavelet transforms to decompose a group of successive frames into temporal sub bands. By exploiting the property of energy conservation of orthonormal wavelet transforms, we construct a rate distortion model for optimal bit rate allocation among different sub bands. To take advantage of the high efficiency video codec H.264/AVC, we encode each sub band with H.264/AVC Fidelity Range Extension (FRExt) intra-coding by assigning optimal bit rates. The experimental results show that our proposed coding scheme outperforms conventional video coding with H.264/AVC for both subjective and objective evaluations.
提出了一种与h .264兼容的足球视频静态背景场景时序子带编码方案。我们利用标准正交小波变换将一组连续的帧分解成时间子带。利用正交小波变换的能量守恒特性,构造了一种用于不同子带间最佳比特率分配的速率失真模型。为了利用H.264/AVC的高效视频编解码器,我们通过分配最佳比特率对H.264/AVC保真范围扩展(FRExt)内编码进行编码。实验结果表明,本文提出的编码方案在主观和客观评价方面都优于传统的H.264/AVC视频编码。
{"title":"H.264-Compatible Coding of Background Soccer Video Using Temporal Subbands","authors":"Xiaohua Lu, Haopeng Li, M. Flierl","doi":"10.1109/ISM.2012.34","DOIUrl":"https://doi.org/10.1109/ISM.2012.34","url":null,"abstract":"This paper presents an H.264-compatible temporal sub band coding scheme for static background scenes of soccer video. We utilize orthonormal wavelet transforms to decompose a group of successive frames into temporal sub bands. By exploiting the property of energy conservation of orthonormal wavelet transforms, we construct a rate distortion model for optimal bit rate allocation among different sub bands. To take advantage of the high efficiency video codec H.264/AVC, we encode each sub band with H.264/AVC Fidelity Range Extension (FRExt) intra-coding by assigning optimal bit rates. The experimental results show that our proposed coding scheme outperforms conventional video coding with H.264/AVC for both subjective and objective evaluations.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130832342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thin and Light Video Editing Extensions for Education with Opencast Matterhorn 薄的和轻的视频编辑扩展与Opencast Matterhorn教育
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.95
Greg Logan, J. Greer, G. McCalla
This paper presents the current state of our research project which aims to give users a simple, easy to use, and computationally light way of creating mashups of lecture content within the Opencast Matter horn lecture capture system. The system modifies the playback components of Matter horn to deliver thin and light video clipping functionality without requiring installation of any additional software. We plan to make use of the extensive logging framework built into Matter horn to examine the effects of this tool on learner engagement.
本文介绍了我们的研究项目的现状,该项目旨在为用户提供一种简单,易于使用,并且计算量轻的方法,可以在Opencast Matter喇叭讲座捕获系统中创建讲座内容的混搭。该系统修改了Matter喇叭的播放组件,以提供轻薄的视频剪辑功能,而无需安装任何额外的软件。我们计划利用内置在Matter horn中的广泛日志框架来检查该工具对学习者参与的影响。
{"title":"Thin and Light Video Editing Extensions for Education with Opencast Matterhorn","authors":"Greg Logan, J. Greer, G. McCalla","doi":"10.1109/ISM.2012.95","DOIUrl":"https://doi.org/10.1109/ISM.2012.95","url":null,"abstract":"This paper presents the current state of our research project which aims to give users a simple, easy to use, and computationally light way of creating mashups of lecture content within the Opencast Matter horn lecture capture system. The system modifies the playback components of Matter horn to deliver thin and light video clipping functionality without requiring installation of any additional software. We plan to make use of the extensive logging framework built into Matter horn to examine the effects of this tool on learner engagement.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Level-Based Peer-to-Peer Live Streaming with Rateless Codes 基于水平的点对点直播与无速率代码
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.54
Eliya Buyukkaya, Shakeel Ahmad, Muneeb Dawood, Jiayi Liu, Fen Zhou, R. Hamzaoui, G. Simon
We propose a peer-to-peer system for streaming user-generated live video. Peers are arranged in levels so that video is delivered at about the same time to all peers in the same level, and peers in a higher level watch the video before those in a lower level. We encode the video bit stream with rate less codes and use trees to transmit the encoded symbols. Trees are constructed to minimize the transmission rate for the source while maximizing the number of served peers and guaranteeing on-time delivery and reliability at the peers. We formulate this objective as a height bounded spanning forest problem with nodal capacity constraint and compute a solution using a heuristic polynomial-time algorithm. We conduct ns-2 simulations to study the trade-off between used bandwidth and video quality for various packet loss rates and link latencies.
我们提出了一个点对点系统,用于流媒体用户生成的实时视频。对等体按级别排列,使得视频几乎同时发送到同一级别的所有对等体,高级别对等体在低级别对等体之前观看视频。我们用低码率编码视频码流,并使用树来传输编码后的符号。树的构建是为了最小化源的传输速率,同时最大化服务的对等体的数量,并保证在对等体上的准时交付和可靠性。我们将此目标表述为具有节点容量约束的高度有界生成森林问题,并使用启发式多项式时间算法计算解决方案。我们进行了ns-2模拟,以研究在各种丢包率和链路延迟下使用的带宽和视频质量之间的权衡。
{"title":"Level-Based Peer-to-Peer Live Streaming with Rateless Codes","authors":"Eliya Buyukkaya, Shakeel Ahmad, Muneeb Dawood, Jiayi Liu, Fen Zhou, R. Hamzaoui, G. Simon","doi":"10.1109/ISM.2012.54","DOIUrl":"https://doi.org/10.1109/ISM.2012.54","url":null,"abstract":"We propose a peer-to-peer system for streaming user-generated live video. Peers are arranged in levels so that video is delivered at about the same time to all peers in the same level, and peers in a higher level watch the video before those in a lower level. We encode the video bit stream with rate less codes and use trees to transmit the encoded symbols. Trees are constructed to minimize the transmission rate for the source while maximizing the number of served peers and guaranteeing on-time delivery and reliability at the peers. We formulate this objective as a height bounded spanning forest problem with nodal capacity constraint and compute a solution using a heuristic polynomial-time algorithm. We conduct ns-2 simulations to study the trade-off between used bandwidth and video quality for various packet loss rates and link latencies.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131660112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Data Aware Admission Control Technique for Social Live Streams (SOLISs) 面向社交直播(SOLISs)的数据感知准入控制技术
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.68
Sumita Barahmand, Shahram Ghandeharizadeh
A SOcial LIve Stream, SOLIS, is a live stream produced by a device whose owner is sharing the stream with her friends, granting each friend to perform time shifted viewing for a pre-specified duration. The system buffers this chase data to facilitate its browsing and display. In the presence of many Solis, memory may overflow and prevent display of some chase data. This paper presents a novel data-aware admission control, DA-AdmCtrl, technique that summarizes chase data pro-actively to maximize the number of admissible SOLISs with no memory overflow. It is designed for use with multi-core CPUs and maximizes utility of data whenever the user's level of satisfaction (utility) with different data formats is available.
社交直播,SOLIS,是由一个设备产生的直播流,其所有者与她的朋友分享流,允许每个朋友在预先指定的持续时间内进行时移观看。系统缓冲这些追逐数据,以方便其浏览和显示。在存在许多Solis的情况下,内存可能会溢出,从而阻止一些追逐数据的显示。本文提出了一种新的数据感知允许控制技术DA-AdmCtrl,该技术主动总结追踪数据,以最大化允许的SOLISs数量,而不会出现内存溢出。它是为使用多核cpu而设计的,只要用户对不同数据格式的满意程度(效用)可用,它就可以最大化数据的效用。
{"title":"A Data Aware Admission Control Technique for Social Live Streams (SOLISs)","authors":"Sumita Barahmand, Shahram Ghandeharizadeh","doi":"10.1109/ISM.2012.68","DOIUrl":"https://doi.org/10.1109/ISM.2012.68","url":null,"abstract":"A SOcial LIve Stream, SOLIS, is a live stream produced by a device whose owner is sharing the stream with her friends, granting each friend to perform time shifted viewing for a pre-specified duration. The system buffers this chase data to facilitate its browsing and display. In the presence of many Solis, memory may overflow and prevent display of some chase data. This paper presents a novel data-aware admission control, DA-AdmCtrl, technique that summarizes chase data pro-actively to maximize the number of admissible SOLISs with no memory overflow. It is designed for use with multi-core CPUs and maximizes utility of data whenever the user's level of satisfaction (utility) with different data formats is available.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Music Part Segmentation in Music TV Programs Based on Chroma Vector Analysis 基于色度矢量分析的音乐电视节目音乐部分分割
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.14
Aiko Uemura, J. Katto, Kyota Higa, Masumi Ishikawa, T. Nomura
This paper presents a music part detection method incorporating chroma vector analysis for use with music TV programs. Results show that envelopes of chroma components of music signals tend to have horizontal (i.e. temporal) correlation in time-frequency representation because music signals have a periodic chord sequences. Based on this fact, we analyze time series of chroma components and attempt to segment music parts in music TV programs from other parts. Experimental results show an F-measure of 0.78, which is better than that obtained using the previous method.
提出了一种结合色度矢量分析的音乐片段检测方法,并将其应用于音乐电视节目中。结果表明,由于音乐信号具有周期性和弦序列,因此音乐信号的色度成分包络在时频表示上具有水平相关性(即时间相关性)。基于这一事实,我们分析了时间序列的色度分量,并尝试将音乐电视节目中的音乐部分从其他部分中分离出来。实验结果表明,该方法的f值为0.78,优于以前的方法。
{"title":"Music Part Segmentation in Music TV Programs Based on Chroma Vector Analysis","authors":"Aiko Uemura, J. Katto, Kyota Higa, Masumi Ishikawa, T. Nomura","doi":"10.1109/ISM.2012.14","DOIUrl":"https://doi.org/10.1109/ISM.2012.14","url":null,"abstract":"This paper presents a music part detection method incorporating chroma vector analysis for use with music TV programs. Results show that envelopes of chroma components of music signals tend to have horizontal (i.e. temporal) correlation in time-frequency representation because music signals have a periodic chord sequences. Based on this fact, we analyze time series of chroma components and attempt to segment music parts in music TV programs from other parts. Experimental results show an F-measure of 0.78, which is better than that obtained using the previous method.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124021944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualizing the Perceived Discomfort of Stereoscopic Video 视觉化立体视频的感知不适
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.41
Yong Ju Jung, Seong-il Lee, Hosik Sohn, Yong Man Ro
Visual discomfort prediction is of importance for image safety issue in stereoscopic displays. This paper proposes automatic visualization of the perceived discomfort for stereoscopic video contents. The proposed method makes effective use of saliency-based measures for visual importance analysis in video scenes. Based on the analysis of visual importance, we quantify and visualize the visual discomfort induced by disparity and motion characteristics of stereoscopic video contents. The proposed method outputs visual importance-based comfort maps that allow users to monitor which regions in each video frame are perceptually significant and problematic with respect to visual discomfort. Experimental results have demonstrated the effectiveness of the proposed method by subjective assessments using various types of stereoscopic videos with diverse disparity and motion characteristics.
视觉不适感预测是立体显示器图像安全问题的重要内容。本文提出了一种立体视频内容感知不适的自动可视化方法。该方法有效地利用了基于显著性的度量来进行视频场景的视觉重要性分析。在分析视觉重要性的基础上,对立体视频内容的视差和运动特性引起的视觉不适进行了量化和可视化。所提出的方法输出基于视觉重要性的舒适地图,允许用户监控每个视频帧中哪些区域在视觉不适方面具有感知意义和问题。实验结果表明,该方法对具有不同视差和运动特征的不同类型的立体视频进行了主观评价,证明了该方法的有效性。
{"title":"Visualizing the Perceived Discomfort of Stereoscopic Video","authors":"Yong Ju Jung, Seong-il Lee, Hosik Sohn, Yong Man Ro","doi":"10.1109/ISM.2012.41","DOIUrl":"https://doi.org/10.1109/ISM.2012.41","url":null,"abstract":"Visual discomfort prediction is of importance for image safety issue in stereoscopic displays. This paper proposes automatic visualization of the perceived discomfort for stereoscopic video contents. The proposed method makes effective use of saliency-based measures for visual importance analysis in video scenes. Based on the analysis of visual importance, we quantify and visualize the visual discomfort induced by disparity and motion characteristics of stereoscopic video contents. The proposed method outputs visual importance-based comfort maps that allow users to monitor which regions in each video frame are perceptually significant and problematic with respect to visual discomfort. Experimental results have demonstrated the effectiveness of the proposed method by subjective assessments using various types of stereoscopic videos with diverse disparity and motion characteristics.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124748427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Simultaneous Image Annotation and Geo-Tag Prediction via Correlation Guided Multi-task Learning 基于关联引导的多任务学习的图像标注和地理标记预测
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.21
Hua Wang, D. Joshi, Jiebo Luo, Heng Huang, Minwoo Park
In recent years, several methods have been proposed to exploit image context (such as location) that provides valuable cues complementary to the image content, i.e., image annotations and geotags have been shown to assist the prediction of each other. To exploit the useful interrelatedness between these two heterogeneous prediction tasks, we propose a new correlation guided structured sparse multi-task learning method. We utilize a joint classification and regression model to identify annotation-informative and geotag-relevant image features. We also introduce the tree-structured sparsity regularizations into multi-task learning to integrate the label correlations in multi-label image annotation. Finally we derive an efficient algorithm to optimize our non-smooth objective function. We demonstrate the performance of our method on three real-world geotagged multi-label image data sets for both semantic annotation and geotag prediction.
近年来,已经提出了几种方法来利用图像上下文(如位置)提供有价值的线索来补充图像内容,即图像注释和地理标记已被证明有助于相互预测。为了利用这两个异构预测任务之间的有用的相互关系,我们提出了一种新的关联引导的结构化稀疏多任务学习方法。我们利用联合分类和回归模型来识别注释信息和地理标签相关的图像特征。我们还在多任务学习中引入了树结构稀疏性正则化,以整合多标签图像标注中的标签相关性。最后给出了一种优化非光滑目标函数的有效算法。我们在三个真实世界的地理标记多标签图像数据集上展示了我们的方法在语义标注和地理标记预测方面的性能。
{"title":"Simultaneous Image Annotation and Geo-Tag Prediction via Correlation Guided Multi-task Learning","authors":"Hua Wang, D. Joshi, Jiebo Luo, Heng Huang, Minwoo Park","doi":"10.1109/ISM.2012.21","DOIUrl":"https://doi.org/10.1109/ISM.2012.21","url":null,"abstract":"In recent years, several methods have been proposed to exploit image context (such as location) that provides valuable cues complementary to the image content, i.e., image annotations and geotags have been shown to assist the prediction of each other. To exploit the useful interrelatedness between these two heterogeneous prediction tasks, we propose a new correlation guided structured sparse multi-task learning method. We utilize a joint classification and regression model to identify annotation-informative and geotag-relevant image features. We also introduce the tree-structured sparsity regularizations into multi-task learning to integrate the label correlations in multi-label image annotation. Finally we derive an efficient algorithm to optimize our non-smooth objective function. We demonstrate the performance of our method on three real-world geotagged multi-label image data sets for both semantic annotation and geotag prediction.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125062170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring Photos in Facebook 在Facebook上浏览照片
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.25
Mark D. Wood, Minwoo Park
Facebook has rapidly become for many the dominant means for sharing images, and the number of shared images accessible to any given Facebook user is easily in the tens of thousands. The sheer volume of pictures relegates most to obscurity, yet some of those pictures would be of great interest -- if a person could only find them. This research explores ways to harness latent semantic information associated with pictures and interpersonal relationships to enable a person to browse for potentially interesting and germane images shared by people in their social network. The possibilities for semantic analysis are endless, this work illustrates two possible approaches while also highlighting future potential applications of semantic understanding.
对于许多人来说,Facebook已经迅速成为分享图片的主要手段,任何给定的Facebook用户都可以访问的共享图片数量很容易达到数万张。图片的数量之多使大多数图片变得默默无闻,然而,如果一个人能找到它们,其中一些照片将会引起极大的兴趣。这项研究探索了利用与图片和人际关系相关的潜在语义信息的方法,使人们能够浏览社交网络中人们共享的潜在有趣和相关的图像。语义分析的可能性是无穷无尽的,这项工作说明了两种可能的方法,同时也强调了语义理解的未来潜在应用。
{"title":"Exploring Photos in Facebook","authors":"Mark D. Wood, Minwoo Park","doi":"10.1109/ISM.2012.25","DOIUrl":"https://doi.org/10.1109/ISM.2012.25","url":null,"abstract":"Facebook has rapidly become for many the dominant means for sharing images, and the number of shared images accessible to any given Facebook user is easily in the tens of thousands. The sheer volume of pictures relegates most to obscurity, yet some of those pictures would be of great interest -- if a person could only find them. This research explores ways to harness latent semantic information associated with pictures and interpersonal relationships to enable a person to browse for potentially interesting and germane images shared by people in their social network. The possibilities for semantic analysis are endless, this work illustrates two possible approaches while also highlighting future potential applications of semantic understanding.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128337677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 IEEE International Symposium on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1