CC-Top: Constrained Clustering for Dynamic Topic Discovery

Jann Goschenhofer, Pranav Ragupathy, C. Heumann, Bernd Bischl, M. Aßenmacher
{"title":"CC-Top: Constrained Clustering for Dynamic Topic Discovery","authors":"Jann Goschenhofer, Pranav Ragupathy, C. Heumann, Bernd Bischl, M. Aßenmacher","doi":"10.18653/v1/2022.evonlp-1.5","DOIUrl":null,"url":null,"abstract":"Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort.Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training.Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.","PeriodicalId":158578,"journal":{"name":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.evonlp-1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort.Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training.Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CC-Top:动态主题发现的约束聚类
短文本多类文本分类的研究主要集中在监督(迁移)学习方法上,需要有限的预定义类集,这些类集随着时间的推移是恒定的。这项工作探索了深度约束聚类(CC)作为监督学习方法的替代方法,在类数量动态变化的环境中,我们将这种任务称为动态主题发现(DTD)。我们通过使用两两相似约束而不是实例级类标签来实现这一点,这允许灵活的类数量,同时与监督方法相比表现出具有竞争力的性能。首先,我们通过一系列实验证实了这一点,并表明CC算法表现出与最先进的监督学习算法相似的预测性能,同时需要更少的注释工作。其次,我们展示了深度CC在模型训练期间缺乏基础真类基数的情况下在短文本数据集中检测主题的过度聚类能力。第三,我们展示了这些功能可以用于DTD设置,作为实现动态学习的一个步骤,最后,我们发布了我们的代码库,以促进该领域的进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Leveraging time-dependent lexical features for offensive language detection MLLabs-LIG at TempoWiC 2022: A Generative Approach for Examining Temporal Meaning Shift Class Incremental Learning for Intent Classification with Limited or No Old Data HSE at TempoWiC: Detecting Meaning Shift in Social Media with Diachronic Language Models CC-Top: Constrained Clustering for Dynamic Topic Discovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1