Md. Rashadul Hasan Rakib, Magdalena Jankowska, N. Zeh, E. Milios
{"title":"基于相似矩阵稀疏的短文本聚类改进","authors":"Md. Rashadul Hasan Rakib, Magdalena Jankowska, N. Zeh, E. Milios","doi":"10.1145/3209280.3229114","DOIUrl":null,"url":null,"abstract":"Short text clustering is an important but challenging task. We investigate impact of similarity matrix sparsification on the performance of short text clustering. We show that two sparsification methods (the proposed Similarity Distribution based, and k-nearest neighbors) that aim to retain a prescribed number of similarity elements per text, improve hierarchical clustering quality of short texts for various text similarities. These methods using a word embedding based similarity yield competitive results with state-of-the-art methods for short text clustering especially for general domain, and are faster than the main state-of-the-art baseline.","PeriodicalId":234145,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering 2018","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improving Short Text Clustering by Similarity Matrix Sparsification\",\"authors\":\"Md. Rashadul Hasan Rakib, Magdalena Jankowska, N. Zeh, E. Milios\",\"doi\":\"10.1145/3209280.3229114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short text clustering is an important but challenging task. We investigate impact of similarity matrix sparsification on the performance of short text clustering. We show that two sparsification methods (the proposed Similarity Distribution based, and k-nearest neighbors) that aim to retain a prescribed number of similarity elements per text, improve hierarchical clustering quality of short texts for various text similarities. These methods using a word embedding based similarity yield competitive results with state-of-the-art methods for short text clustering especially for general domain, and are faster than the main state-of-the-art baseline.\",\"PeriodicalId\":234145,\"journal\":{\"name\":\"Proceedings of the ACM Symposium on Document Engineering 2018\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Symposium on Document Engineering 2018\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3209280.3229114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209280.3229114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Short Text Clustering by Similarity Matrix Sparsification
Short text clustering is an important but challenging task. We investigate impact of similarity matrix sparsification on the performance of short text clustering. We show that two sparsification methods (the proposed Similarity Distribution based, and k-nearest neighbors) that aim to retain a prescribed number of similarity elements per text, improve hierarchical clustering quality of short texts for various text similarities. These methods using a word embedding based similarity yield competitive results with state-of-the-art methods for short text clustering especially for general domain, and are faster than the main state-of-the-art baseline.