Yan Yan, Zhongwen Xu, Gaowen Liu, Zhigang Ma, N. Sebe
The selection of discriminative features is an important and effective technique for many multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in emph{feature selection} has been widely investigated during the past years. Motivated by the merit of sparse models, we propose a novel feature selection method using a sparse model in this paper. Different from the state of the art, our method is built upon $ell _{2,p}$-norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the $ell_{2,p}$-norm sparsity optimization problem in this paper. Experimental results performed on real-world image and video datasets show the effectiveness of our feature selection method compared to several state-of-the-art methods.
{"title":"GLocal structural feature selection with sparsity for multimedia data understanding","authors":"Yan Yan, Zhongwen Xu, Gaowen Liu, Zhigang Ma, N. Sebe","doi":"10.1145/2502081.2502142","DOIUrl":"https://doi.org/10.1145/2502081.2502142","url":null,"abstract":"The selection of discriminative features is an important and effective technique for many multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in emph{feature selection} has been widely investigated during the past years. Motivated by the merit of sparse models, we propose a novel feature selection method using a sparse model in this paper. Different from the state of the art, our method is built upon $ell _{2,p}$-norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the $ell_{2,p}$-norm sparsity optimization problem in this paper. Experimental results performed on real-world image and video datasets show the effectiveness of our feature selection method compared to several state-of-the-art methods.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"2013 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89615655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The field of image processing in the encrypted domain has been given increasing attention for the extensive potential applications, for example, providing efficient and secure solutions for privacy-preserving applications in untrusted environment. One obstacle to the widespread use of these techniques is the ciphertext expansion of high orders of magnitude caused by the existing homomorphic encryptions. In this paper, we provide a way to tackle this issue for image processing in the encrypted domain. By using characteristics of image format, we develop an image encryption scheme to limit ciphertext expansion while preserving the homomorphic property. The proposed encryption scheme first encrypts image pixels with an existing probabilistic homomorphic cryptosystem, and then compresses the whole encrypted image in order to save storage space. Our scheme has a much smaller ciphertext expansion factor compared with the element-wise encryption scheme, while preserving the homomorphic property. It is not necessary to require additional interactive protocols when applying secure signal processing tools to the compressed encrypted image. We present a fast algorithm for the encryption and the compression of the proposed image encryption scheme, which speeds up the computation and makes our scheme much more efficient. The analysis on the security, ciphertext expansion ratio, and computational complexity are also conducted. Our experiments demonstrate the validity of the proposed algorithms. The proposed scheme is suitable to be employed as an image encryption method for the applications in secure image processing.
{"title":"An efficient image homomorphic encryption scheme with small ciphertext expansion","authors":"Peijia Zheng, Jiwu Huang","doi":"10.1145/2502081.2502105","DOIUrl":"https://doi.org/10.1145/2502081.2502105","url":null,"abstract":"The field of image processing in the encrypted domain has been given increasing attention for the extensive potential applications, for example, providing efficient and secure solutions for privacy-preserving applications in untrusted environment. One obstacle to the widespread use of these techniques is the ciphertext expansion of high orders of magnitude caused by the existing homomorphic encryptions. In this paper, we provide a way to tackle this issue for image processing in the encrypted domain. By using characteristics of image format, we develop an image encryption scheme to limit ciphertext expansion while preserving the homomorphic property. The proposed encryption scheme first encrypts image pixels with an existing probabilistic homomorphic cryptosystem, and then compresses the whole encrypted image in order to save storage space. Our scheme has a much smaller ciphertext expansion factor compared with the element-wise encryption scheme, while preserving the homomorphic property. It is not necessary to require additional interactive protocols when applying secure signal processing tools to the compressed encrypted image. We present a fast algorithm for the encryption and the compression of the proposed image encryption scheme, which speeds up the computation and makes our scheme much more efficient. The analysis on the security, ciphertext expansion ratio, and computational complexity are also conducted. Our experiments demonstrate the validity of the proposed algorithms. The proposed scheme is suitable to be employed as an image encryption method for the applications in secure image processing.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"309 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91457979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salient object detection techniques have a variety of multimedia applications of broad interest. However, the detection must be fast to truly aid in these processes. There exist many robust algorithms tackling the salient object detection problem but most of them are computationally demanding. In this demonstration we show a fast salient object detection system implemented in a conventional PC environment. We examine the challenges faced in the design and development of a practical system that can achieve accurate detection in real-time.
{"title":"Real-time salient object detection","authors":"Chia-Ju Lu, Chih-Fan Hsu, Mei-Chen Yeh","doi":"10.1145/2502081.2502240","DOIUrl":"https://doi.org/10.1145/2502081.2502240","url":null,"abstract":"Salient object detection techniques have a variety of multimedia applications of broad interest. However, the detection must be fast to truly aid in these processes. There exist many robust algorithms tackling the salient object detection problem but most of them are computationally demanding. In this demonstration we show a fast salient object detection system implemented in a conventional PC environment. We examine the challenges faced in the design and development of a practical system that can achieve accurate detection in real-time.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80694623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image tag relevance estimation aims to automatically determine what people label about images is factually present in the pictorial content. Different from previous works, which either use only positive examples of a given tag or use positive and random negative examples, we argue the importance of relevant positive and relevant negative examples for tag relevance estimation. We propose a system that selects positive and negative examples, deemed most relevant with respect to the given tag from crowd-annotated images. While applying models for many tags could be cumbersome, our system trains efficient ensembles of Support Vector Machines per tag, enabling fast classification. Experiments on two benchmark sets show that the proposed system compares favorably against five present day methods. Given extracted visual features, for each image our system can process up to 3,787 tags per second. The new system is both effective and efficient for tag relevance estimation.
{"title":"Classifying tag relevance with relevant positive and negative examples","authors":"Xirong Li, Cees G. M. Snoek","doi":"10.1145/2502081.2502129","DOIUrl":"https://doi.org/10.1145/2502081.2502129","url":null,"abstract":"Image tag relevance estimation aims to automatically determine what people label about images is factually present in the pictorial content. Different from previous works, which either use only positive examples of a given tag or use positive and random negative examples, we argue the importance of relevant positive and relevant negative examples for tag relevance estimation. We propose a system that selects positive and negative examples, deemed most relevant with respect to the given tag from crowd-annotated images. While applying models for many tags could be cumbersome, our system trains efficient ensembles of Support Vector Machines per tag, enabling fast classification. Experiments on two benchmark sets show that the proposed system compares favorably against five present day methods. Given extracted visual features, for each image our system can process up to 3,787 tags per second. The new system is both effective and efficient for tag relevance estimation.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80336799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artists and fashion designers have recently been creating a new form of art -- Camouflage Art -- which can be used to prevent computer vision algorithms from detecting faces. This digital art technique combines makeup and hair styling, or other modifications such as facial painting to help avoid automatic face-detection. In this paper, we first study the camouflage interference and its effectiveness on several current state of art techniques in face detection/recognition; and then present a tool that can facilitate digital art design for such camouflage that can fool these computer vision algorithms. This tool can find the prominent or decisive features from facial images that constitute the face being recognized; and give suggestions for camouflage options (makeup, styling, paints) on particular facial features or facial parts. Testing of this tool shows that it can effectively aid the artists or designers in creating camouflage-thwarting designs. The evaluation on suggested camouflages applied on 40 celebrities across eight different face recognition systems (both non-commercial or commercial) shows that 82.5% ~ 100% of times the subject is unrecognizable using the suggested camouflage.
{"title":"Facilitating fashion camouflage art","authors":"Ranran Feng, B. Prabhakaran","doi":"10.1145/2502081.2502121","DOIUrl":"https://doi.org/10.1145/2502081.2502121","url":null,"abstract":"Artists and fashion designers have recently been creating a new form of art -- Camouflage Art -- which can be used to prevent computer vision algorithms from detecting faces. This digital art technique combines makeup and hair styling, or other modifications such as facial painting to help avoid automatic face-detection. In this paper, we first study the camouflage interference and its effectiveness on several current state of art techniques in face detection/recognition; and then present a tool that can facilitate digital art design for such camouflage that can fool these computer vision algorithms. This tool can find the prominent or decisive features from facial images that constitute the face being recognized; and give suggestions for camouflage options (makeup, styling, paints) on particular facial features or facial parts. Testing of this tool shows that it can effectively aid the artists or designers in creating camouflage-thwarting designs. The evaluation on suggested camouflages applied on 40 celebrities across eight different face recognition systems (both non-commercial or commercial) shows that 82.5% ~ 100% of times the subject is unrecognizable using the suggested camouflage.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78169275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Best paper session","authors":"R. Zimmerman","doi":"10.1145/3245285","DOIUrl":"https://doi.org/10.1145/3245285","url":null,"abstract":"","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78615638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti
We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.
我们提出了一个通用的事件检测系统,在TRECVID 2012的监视事件检测(SED)任务中进行了评估。我们研究了一种具有时空特征的统计方法,应用于SED任务定义的七个事件类。该方法基于局部时空描述符,称为MoSIFT,并由成对视频帧生成。学习了高斯混合模型(GMM)来模拟低层特征的分布。然后,对于每个滑动窗口,使用Fisher矢量编码[improvedFV]来生成样本表示。对每个事件使用线性支持向量机学习模型。本系统的主要新颖之处在于将Fisher矢量编码引入视频事件检测中。费雪矢量编码在图像分类中取得了巨大的成功。关键思想是将低级视觉特征建模为高斯混合模型,并为特征包生成中间向量表示。FV编码使用高阶统计量来代替标准BoW中的直方图。FV有几个很好的特性:(a)它可以自然地将视频特定信息从嘈杂的局部特征中分离出来;(b)我们可以使用线性模型来表示这种特征。我们构建了一个有效的FV编码实现,可以实现10倍的实时加速。我们还利用非平凡的目标定位技术,如多尺度检测和非最大值抑制,来为视频事件检测提供信息。该方法在7种事件类型中的4种上优于TRECVID SED 2012中所有其他团队提交的结果。
{"title":"Spatio-temporal fisher vector coding for surveillance event detection","authors":"Qiang Chen, Yang Cai, L. Brown, A. Datta, Quanfu Fan, R. Feris, Shuicheng Yan, Alexander Hauptmann, Sharath Pankanti","doi":"10.1145/2502081.2502155","DOIUrl":"https://doi.org/10.1145/2502081.2502155","url":null,"abstract":"We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"31 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76513765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.
{"title":"Object co-segmentation via discriminative low rank matrix recovery","authors":"Yong Li, J. Liu, Zechao Li, Yang Liu, Hanqing Lu","doi":"10.1145/2502081.2502195","DOIUrl":"https://doi.org/10.1145/2502081.2502195","url":null,"abstract":"The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78654334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.
{"title":"Multimedia framed","authors":"E. Churchill","doi":"10.1145/2502081.2512088","DOIUrl":"https://doi.org/10.1145/2502081.2512088","url":null,"abstract":"Multimedia is the combination of several media forms, More typically, the word implies sound and full-motion video. While multimedia technologists concern themselves with the production and distribution of the multimedia artifacts themselves, information designers, educationalists and artists are more concerned with the reception of the artifact, and consider multimedia to be another representational format for multimodal information presentation. Such a perspective leads to questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead or in addition? How does the setting affect perception/reception? Is the artifact interactive? Is it changed by audience members? Understanding how an artifact is perceived, received and interacted with is central to understanding what multimedia is, opening up possibilities and issuing technical challenges as we imagine new forms and formats of multimedia experience. In this talk, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will discuss audience participatory production of multimedia and multimodal experiences. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76916829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper describes the approach, the experimental settings, and the results obtained by the proposed methodology at the ACM Yahoo! Multimedia Grand Challenge. Its main contribution is the use of fast and efficient features with a highly scalable semi-supervised learning approach, the Approximate Laplacian Eigenmaps (ALEs), and its extension, by computing the test set incrementally for learning concepts in time linear to the number of images (both labelled and unlabelled). A combination of two local visual features combined with the VLAD feature aggregation method and PCA is used to improve the efficiency and time complexity. Our methodology achieves somewhat better accuracy compared to the baseline (linear SVM) in small training sets, but improves the performance as the training data increase. Performing ALE fusion on a training set of 50K/concept resulted in a MiAP score of 0.4223, which was among the highest scores of the proposed approach.
{"title":"Scalable training with approximate incremental laplacian eigenmaps and PCA","authors":"Eleni Mantziou, S. Papadopoulos, Y. Kompatsiaris","doi":"10.1145/2502081.2508124","DOIUrl":"https://doi.org/10.1145/2502081.2508124","url":null,"abstract":"The paper describes the approach, the experimental settings, and the results obtained by the proposed methodology at the ACM Yahoo! Multimedia Grand Challenge. Its main contribution is the use of fast and efficient features with a highly scalable semi-supervised learning approach, the Approximate Laplacian Eigenmaps (ALEs), and its extension, by computing the test set incrementally for learning concepts in time linear to the number of images (both labelled and unlabelled). A combination of two local visual features combined with the VLAD feature aggregation method and PCA is used to improve the efficiency and time complexity. Our methodology achieves somewhat better accuracy compared to the baseline (linear SVM) in small training sets, but improves the performance as the training data increase. Performing ALE fusion on a training set of 50K/concept resulted in a MiAP score of 0.4223, which was among the highest scores of the proposed approach.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"286 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77080075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}