首页 > 最新文献

Proceedings of the 21st ACM international conference on Multimedia最新文献

英文 中文
GLocal structural feature selection with sparsity for multimedia data understanding 基于稀疏度的多媒体数据局部结构特征选择
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502142
Yan Yan, Zhongwen Xu, Gaowen Liu, Zhigang Ma, N. Sebe
The selection of discriminative features is an important and effective technique for many multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in emph{feature selection} has been widely investigated during the past years. Motivated by the merit of sparse models, we propose a novel feature selection method using a sparse model in this paper. Different from the state of the art, our method is built upon $ell _{2,p}$-norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the $ell_{2,p}$-norm sparsity optimization problem in this paper. Experimental results performed on real-world image and video datasets show the effectiveness of our feature selection method compared to several state-of-the-art methods.
鉴别特征的选择是许多多媒体任务的一项重要而有效的技术。在分类或聚类任务中使用不相关的特征可能会降低性能。因此,设计有效的特征选择算法来去除不相关的特征是提高分类或聚类性能的可能途径。随着稀疏模型在图像和视频分类和理解中的成功应用,在emph{特征选择}中施加结构稀疏性得到了广泛的研究。基于稀疏模型的优点,本文提出了一种基于稀疏模型的特征选择方法。与目前的技术状况不同,我们的方法建立在$ell _{2,p}$ -norm之上,同时考虑数据分布的全局和局部(GLocal)结构。由于该方法能够控制稀疏度,因此在选择判别特征方面更加灵活。同时考虑数据分布的全局和局部结构,使得特征选择过程更加有效。本文提出了一种求解$ell_{2,p}$ -范数稀疏性优化问题的有效算法。在真实世界的图像和视频数据集上进行的实验结果表明,与几种最先进的方法相比,我们的特征选择方法是有效的。
{"title":"GLocal structural feature selection with sparsity for multimedia data understanding","authors":"Yan Yan, Zhongwen Xu, Gaowen Liu, Zhigang Ma, N. Sebe","doi":"10.1145/2502081.2502142","DOIUrl":"https://doi.org/10.1145/2502081.2502142","url":null,"abstract":"The selection of discriminative features is an important and effective technique for many multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in emph{feature selection} has been widely investigated during the past years. Motivated by the merit of sparse models, we propose a novel feature selection method using a sparse model in this paper. Different from the state of the art, our method is built upon $ell _{2,p}$-norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the $ell_{2,p}$-norm sparsity optimization problem in this paper. Experimental results performed on real-world image and video datasets show the effectiveness of our feature selection method compared to several state-of-the-art methods.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"2013 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89615655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An efficient image homomorphic encryption scheme with small ciphertext expansion 一种具有小密文扩展的高效图像同态加密方案
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502105
Peijia Zheng, Jiwu Huang
The field of image processing in the encrypted domain has been given increasing attention for the extensive potential applications, for example, providing efficient and secure solutions for privacy-preserving applications in untrusted environment. One obstacle to the widespread use of these techniques is the ciphertext expansion of high orders of magnitude caused by the existing homomorphic encryptions. In this paper, we provide a way to tackle this issue for image processing in the encrypted domain. By using characteristics of image format, we develop an image encryption scheme to limit ciphertext expansion while preserving the homomorphic property. The proposed encryption scheme first encrypts image pixels with an existing probabilistic homomorphic cryptosystem, and then compresses the whole encrypted image in order to save storage space. Our scheme has a much smaller ciphertext expansion factor compared with the element-wise encryption scheme, while preserving the homomorphic property. It is not necessary to require additional interactive protocols when applying secure signal processing tools to the compressed encrypted image. We present a fast algorithm for the encryption and the compression of the proposed image encryption scheme, which speeds up the computation and makes our scheme much more efficient. The analysis on the security, ciphertext expansion ratio, and computational complexity are also conducted. Our experiments demonstrate the validity of the proposed algorithms. The proposed scheme is suitable to be employed as an image encryption method for the applications in secure image processing.
加密领域的图像处理因其广泛的应用前景而受到越来越多的关注,例如为非可信环境下的隐私保护应用提供高效、安全的解决方案。阻碍这些技术广泛应用的一个障碍是现有的同态加密导致的密文扩展的高数量级。在本文中,我们提供了一种解决这一问题的方法,用于加密域的图像处理。利用图像格式的特点,提出了一种既限制密文扩展又保持密文同态的图像加密方案。该加密方案首先使用现有的概率同态密码系统对图像像素进行加密,然后对整个加密图像进行压缩以节省存储空间。与元素加密方案相比,我们的方案具有更小的密文扩展因子,同时保持了密文的同态特性。在对压缩的加密图像应用安全信号处理工具时,不需要额外的交互协议。提出了一种快速的图像加密和压缩算法,加快了算法的计算速度,提高了算法的效率。并对该算法的安全性、密文扩展率和计算复杂度进行了分析。实验证明了所提算法的有效性。该方案适合作为一种图像加密方法应用于安全图像处理。
{"title":"An efficient image homomorphic encryption scheme with small ciphertext expansion","authors":"Peijia Zheng, Jiwu Huang","doi":"10.1145/2502081.2502105","DOIUrl":"https://doi.org/10.1145/2502081.2502105","url":null,"abstract":"The field of image processing in the encrypted domain has been given increasing attention for the extensive potential applications, for example, providing efficient and secure solutions for privacy-preserving applications in untrusted environment. One obstacle to the widespread use of these techniques is the ciphertext expansion of high orders of magnitude caused by the existing homomorphic encryptions. In this paper, we provide a way to tackle this issue for image processing in the encrypted domain. By using characteristics of image format, we develop an image encryption scheme to limit ciphertext expansion while preserving the homomorphic property. The proposed encryption scheme first encrypts image pixels with an existing probabilistic homomorphic cryptosystem, and then compresses the whole encrypted image in order to save storage space. Our scheme has a much smaller ciphertext expansion factor compared with the element-wise encryption scheme, while preserving the homomorphic property. It is not necessary to require additional interactive protocols when applying secure signal processing tools to the compressed encrypted image. We present a fast algorithm for the encryption and the compression of the proposed image encryption scheme, which speeds up the computation and makes our scheme much more efficient. The analysis on the security, ciphertext expansion ratio, and computational complexity are also conducted. Our experiments demonstrate the validity of the proposed algorithms. The proposed scheme is suitable to be employed as an image encryption method for the applications in secure image processing.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"309 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91457979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Real-time salient object detection 实时显著目标检测
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502240
Chia-Ju Lu, Chih-Fan Hsu, Mei-Chen Yeh
Salient object detection techniques have a variety of multimedia applications of broad interest. However, the detection must be fast to truly aid in these processes. There exist many robust algorithms tackling the salient object detection problem but most of them are computationally demanding. In this demonstration we show a fast salient object detection system implemented in a conventional PC environment. We examine the challenges faced in the design and development of a practical system that can achieve accurate detection in real-time.
显著目标检测技术具有广泛的多媒体应用前景。然而,检测必须快速才能真正帮助这些过程。针对突出的目标检测问题,已有许多鲁棒算法,但大多数算法的计算量都很高。在这个演示中,我们展示了一个在传统PC环境中实现的快速显著目标检测系统。我们研究了在设计和开发一个能够实现实时准确检测的实用系统时所面临的挑战。
{"title":"Real-time salient object detection","authors":"Chia-Ju Lu, Chih-Fan Hsu, Mei-Chen Yeh","doi":"10.1145/2502081.2502240","DOIUrl":"https://doi.org/10.1145/2502081.2502240","url":null,"abstract":"Salient object detection techniques have a variety of multimedia applications of broad interest. However, the detection must be fast to truly aid in these processes. There exist many robust algorithms tackling the salient object detection problem but most of them are computationally demanding. In this demonstration we show a fast salient object detection system implemented in a conventional PC environment. We examine the challenges faced in the design and development of a practical system that can achieve accurate detection in real-time.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80694623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Classifying tag relevance with relevant positive and negative examples 用相关的正反例对标签相关性进行分类
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502129
Xirong Li, Cees G. M. Snoek
Image tag relevance estimation aims to automatically determine what people label about images is factually present in the pictorial content. Different from previous works, which either use only positive examples of a given tag or use positive and random negative examples, we argue the importance of relevant positive and relevant negative examples for tag relevance estimation. We propose a system that selects positive and negative examples, deemed most relevant with respect to the given tag from crowd-annotated images. While applying models for many tags could be cumbersome, our system trains efficient ensembles of Support Vector Machines per tag, enabling fast classification. Experiments on two benchmark sets show that the proposed system compares favorably against five present day methods. Given extracted visual features, for each image our system can process up to 3,787 tags per second. The new system is both effective and efficient for tag relevance estimation.
图像标签相关性估计的目的是自动确定人们对图像所标记的内容是否真实存在于图像内容中。与以往只使用给定标签的正例或使用正例和随机负例不同,我们认为相关的正例和相关的负例对于标签相关性估计的重要性。我们提出了一个系统,从人群注释图像中选择被认为与给定标签最相关的正面和负面示例。虽然对许多标签应用模型可能很麻烦,但我们的系统可以为每个标签训练有效的支持向量机集合,从而实现快速分类。在两个基准集上的实验表明,该系统优于现有的五种方法。给定提取的视觉特征,对于每张图像,我们的系统每秒可以处理多达3,787个标签。该系统对标签相关性的估计是有效的。
{"title":"Classifying tag relevance with relevant positive and negative examples","authors":"Xirong Li, Cees G. M. Snoek","doi":"10.1145/2502081.2502129","DOIUrl":"https://doi.org/10.1145/2502081.2502129","url":null,"abstract":"Image tag relevance estimation aims to automatically determine what people label about images is factually present in the pictorial content. Different from previous works, which either use only positive examples of a given tag or use positive and random negative examples, we argue the importance of relevant positive and relevant negative examples for tag relevance estimation. We propose a system that selects positive and negative examples, deemed most relevant with respect to the given tag from crowd-annotated images. While applying models for many tags could be cumbersome, our system trains efficient ensembles of Support Vector Machines per tag, enabling fast classification. Experiments on two benchmark sets show that the proposed system compares favorably against five present day methods. Given extracted visual features, for each image our system can process up to 3,787 tags per second. The new system is both effective and efficient for tag relevance estimation.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80336799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Facilitating fashion camouflage art 促进时尚迷彩艺术
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502121
Ranran Feng, B. Prabhakaran
Artists and fashion designers have recently been creating a new form of art -- Camouflage Art -- which can be used to prevent computer vision algorithms from detecting faces. This digital art technique combines makeup and hair styling, or other modifications such as facial painting to help avoid automatic face-detection. In this paper, we first study the camouflage interference and its effectiveness on several current state of art techniques in face detection/recognition; and then present a tool that can facilitate digital art design for such camouflage that can fool these computer vision algorithms. This tool can find the prominent or decisive features from facial images that constitute the face being recognized; and give suggestions for camouflage options (makeup, styling, paints) on particular facial features or facial parts. Testing of this tool shows that it can effectively aid the artists or designers in creating camouflage-thwarting designs. The evaluation on suggested camouflages applied on 40 celebrities across eight different face recognition systems (both non-commercial or commercial) shows that 82.5% ~ 100% of times the subject is unrecognizable using the suggested camouflage.
艺术家和时装设计师最近创造了一种新的艺术形式——伪装艺术——可以用来防止计算机视觉算法检测人脸。这种数字艺术技术结合了化妆和头发造型,或其他修改,如面部绘画,以帮助避免自动面部检测。本文首先研究了伪装干扰及其对当前几种人脸检测/识别技术的影响;然后提出一个工具,可以促进这种伪装的数字艺术设计,可以欺骗这些计算机视觉算法。该工具可以从人脸图像中找出构成被识别人脸的突出或决定性特征;并对特定面部特征或面部部位的伪装选项(化妆,造型,油漆)提出建议。对该工具的测试表明,它可以有效地帮助艺术家或设计师创造挫败伪装的设计。通过8种不同的人脸识别系统(非商业或商业)对40名名人的建议伪装进行评估,结果显示,使用建议的伪装,受试者有82.5% ~ 100%无法识别。
{"title":"Facilitating fashion camouflage art","authors":"Ranran Feng, B. Prabhakaran","doi":"10.1145/2502081.2502121","DOIUrl":"https://doi.org/10.1145/2502081.2502121","url":null,"abstract":"Artists and fashion designers have recently been creating a new form of art -- Camouflage Art -- which can be used to prevent computer vision algorithms from detecting faces. This digital art technique combines makeup and hair styling, or other modifications such as facial painting to help avoid automatic face-detection. In this paper, we first study the camouflage interference and its effectiveness on several current state of art techniques in face detection/recognition; and then present a tool that can facilitate digital art design for such camouflage that can fool these computer vision algorithms. This tool can find the prominent or decisive features from facial images that constitute the face being recognized; and give suggestions for camouflage options (makeup, styling, paints) on particular facial features or facial parts. Testing of this tool shows that it can effectively aid the artists or designers in creating camouflage-thwarting designs. The evaluation on suggested camouflages applied on 40 celebrities across eight different face recognition systems (both non-commercial or commercial) shows that 82.5% ~ 100% of times the subject is unrecognizable using the suggested camouflage.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78169275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Session details: Best paper session 会议细节:最佳论文会议
Pub Date : 2013-10-21 DOI: 10.1145/3245285
R. Zimmerman
{"title":"Session details: Best paper session","authors":"R. Zimmerman","doi":"10.1145/3245285","DOIUrl":"https://doi.org/10.1145/3245285","url":null,"abstract":"","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78615638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal fisher vector coding for surveillance event detection 监测事件检测的时空fisher矢量编码
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502155
Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti
We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.
我们提出了一个通用的事件检测系统,在TRECVID 2012的监视事件检测(SED)任务中进行了评估。我们研究了一种具有时空特征的统计方法,应用于SED任务定义的七个事件类。该方法基于局部时空描述符,称为MoSIFT,并由成对视频帧生成。学习了高斯混合模型(GMM)来模拟低层特征的分布。然后,对于每个滑动窗口,使用Fisher矢量编码[improvedFV]来生成样本表示。对每个事件使用线性支持向量机学习模型。本系统的主要新颖之处在于将Fisher矢量编码引入视频事件检测中。费雪矢量编码在图像分类中取得了巨大的成功。关键思想是将低级视觉特征建模为高斯混合模型,并为特征包生成中间向量表示。FV编码使用高阶统计量来代替标准BoW中的直方图。FV有几个很好的特性:(a)它可以自然地将视频特定信息从嘈杂的局部特征中分离出来;(b)我们可以使用线性模型来表示这种特征。我们构建了一个有效的FV编码实现,可以实现10倍的实时加速。我们还利用非平凡的目标定位技术,如多尺度检测和非最大值抑制,来为视频事件检测提供信息。该方法在7种事件类型中的4种上优于TRECVID SED 2012中所有其他团队提交的结果。
{"title":"Spatio-temporal fisher vector coding for surveillance event detection","authors":"Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti","doi":"10.1145/2502081.2502155","DOIUrl":"https://doi.org/10.1145/2502081.2502155","url":null,"abstract":"We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"31 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76513765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Object co-segmentation via discriminative low rank matrix recovery 基于判别低秩矩阵恢复的目标共分割
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502195
Yong Li, J. Liu, Zechao Li, Yang Liu, Hanqing Lu
The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.
本文的目标是同时分割同一目标类别的一组图像中出现的目标区域,称为目标共分割。与传统方法简单地假设图像之间共有的区域为目标区域不同,我们在此基础上考虑了来自一致背景的干扰,不仅将图像之间共有的区域作为目标区域,而且将图像之间显著的区域作为目标区域。为此,我们提出了一种判别性低秩矩阵恢复(Discriminative Low Rank matrix Recovery, DLRR)算法,将给定图像集的过完全分割区域(即超像素)划分为目标区域和非目标区域。在DLRR中,采用低秩矩阵恢复项检测图像中的显著区域,采用判别学习项从所有超像素中区分目标区域。引入一个额外的正则化项来共同度量图像集的每个超像素对应的预测显著性与客观概率之间的不一致。对于连接上述三个术语的统一学习问题,我们设计了一种基于块坐标下降的高效优化过程。在两个公共数据集(即MSRC和iCoseg)上进行了大量实验,并与一些最先进的数据集进行了比较,证明了我们工作的有效性。
{"title":"Object co-segmentation via discriminative low rank matrix recovery","authors":"Yong Li, J. Liu, Zechao Li, Yang Liu, Hanqing Lu","doi":"10.1145/2502081.2502195","DOIUrl":"https://doi.org/10.1145/2502081.2502195","url":null,"abstract":"The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78654334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multimedia framed 多媒体框架
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2512088
E. Churchill
Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.
多媒体是几种媒体形式的组合,更典型的是,这个词意味着声音和全动态视频。多媒体技术人员关注的是多媒体产品本身的生产和分发,而信息设计师、教育家和艺术家则更关注产品的接收,并将多媒体视为多模态信息呈现的另一种表现形式。这样的观点会导致这样的问题:文本、音频、视频,或者三者的结合,是信息的最佳格式吗?是否应该调用另一种模态(例如,触觉/触觉,嗅觉)来代替或补充?环境如何影响感知/接受?工件是交互式的吗?是被观众改变的吗?理解人工制品是如何被感知、接收和交互的,对于理解多媒体是什么,在我们想象多媒体体验的新形式和格式时,打开可能性并提出技术挑战是至关重要的。在这次演讲中,我将说明内容理解是如何被上下文和内容的“框架”所调节的。我将讨论多媒体和多模式体验的观众参与式生产。最后,我将介绍一些技术上的激动人心之处、设计/开发方面的挑战以及未来的体验可能性。
{"title":"Multimedia framed","authors":"E. Churchill","doi":"10.1145/2502081.2512088","DOIUrl":"https://doi.org/10.1145/2502081.2512088","url":null,"abstract":"Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76916829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable training with approximate incremental laplacian eigenmaps and PCA 基于近似增量拉普拉斯特征映射和PCA的可扩展训练
Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508124
Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris
The paper describes the approach, the experimental settings, and the results obtained by the proposed methodology at the ACM Yahoo! Multimedia Grand Challenge. Its main contribution is the use of fast and efficient features with a highly scalable semi-supervised learning approach, the Approximate Laplacian Eigenmaps (ALEs), and its extension, by computing the test set incrementally for learning concepts in time linear to the number of images (both labelled and unlabelled). A combination of two local visual features combined with the VLAD feature aggregation method and PCA is used to improve the efficiency and time complexity. Our methodology achieves somewhat better accuracy compared to the baseline (linear SVM) in small training sets, but improves the performance as the training data increase. Performing ALE fusion on a training set of 50K/concept resulted in a MiAP score of 0.4223, which was among the highest scores of the proposed approach.
本文描述了该方法、实验设置以及采用该方法获得的结果。多媒体大挑战。它的主要贡献是使用快速高效的特征和高度可扩展的半监督学习方法,近似拉普拉斯特征映射(ALEs)及其扩展,通过增量计算测试集来学习与图像数量(标记和未标记)线性的概念。采用VLAD特征聚合法和PCA相结合的两局部视觉特征组合,提高了效率和时间复杂度。与基线(线性支持向量机)相比,我们的方法在小的训练集中获得了更好的准确性,但随着训练数据的增加,性能会有所提高。在50K/concept的训练集上执行ALE融合,MiAP得分为0.4223,是该方法的最高分之一。
{"title":"Scalable training with approximate incremental laplacian eigenmaps and PCA","authors":"Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris","doi":"10.1145/2502081.2508124","DOIUrl":"https://doi.org/10.1145/2502081.2508124","url":null,"abstract":"The paper describes the approach, the experimental settings, and the results obtained by the proposed methodology at the ACM Yahoo! Multimedia Grand Challenge. Its main contribution is the use of fast and efficient features with a highly scalable semi-supervised learning approach, the Approximate Laplacian Eigenmaps (ALEs), and its extension, by computing the test set incrementally for learning concepts in time linear to the number of images (both labelled and unlabelled). A combination of two local visual features combined with the VLAD feature aggregation method and PCA is used to improve the efficiency and time complexity. Our methodology achieves somewhat better accuracy compared to the baseline (linear SVM) in small training sets, but improves the performance as the training data increase. Performing ALE fusion on a training set of 50K/concept resulted in a MiAP score of 0.4223, which was among the highest scores of the proposed approach.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"286 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77080075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the 21st ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1