Deep learning vs spectral clustering into an active clustering with pairwise constraints propagation

Nicolas Voiron, A. Benoît, P. Lambert, B. Ionescu
{"title":"Deep learning vs spectral clustering into an active clustering with pairwise constraints propagation","authors":"Nicolas Voiron, A. Benoît, P. Lambert, B. Ionescu","doi":"10.1109/CBMI.2016.7500237","DOIUrl":null,"url":null,"abstract":"In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2016.7500237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度学习与光谱聚类的对比,形成具有两两约束传播的主动聚类
在我们这个数据驱动的世界里,分类对于帮助最终用户和决策者理解信息结构是非常重要的。监督学习技术依赖于通常难以获得且训练经常过拟合的带注释的样本。另一方面,无监督聚类技术在不处理任何训练数据的情况下研究数据的结构。考虑到任务的难度,监督学习通常优于非监督学习。一种折衷的方法是使用部分知识,以一种明智的方式选择,以提高性能,同时最小化学习成本,即所谓的半监督学习。在这种用例中,谱聚类被证明是一种有效的方法。此外,深度学习优于几种最先进的分类方法,在我们的环境中测试它是很有趣的。本文首先将深度学习的概念引入到主动半监督聚类过程中,并将其与谱聚类进行比较。其次,我们引入了约束传播,并演示了它如何在降低注释成本的同时最大化分区质量。在两个不同的真实数据集上进行了实验验证。结果表明了聚类方法的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Music Tweet Map: A browsing interface to explore the microblogosphere of music A novel architecture of semantic web reasoner based on transferable belief model Simple tag-based subclass representations for visually-varied image classes Crowdsourcing as self-fulfilling prophecy: Influence of discarding workers in subjective assessment tasks EIR — Efficient computer aided diagnosis framework for gastrointestinal endoscopies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1