Bassoma Diallo, Jie Hu, Tianrui Li, G. Khan, Chunyan Ji
{"title":"概念增强的文档数据多视图聚类","authors":"Bassoma Diallo, Jie Hu, Tianrui Li, G. Khan, Chunyan Ji","doi":"10.1109/ISKE47853.2019.9170436","DOIUrl":null,"url":null,"abstract":"Many works implemented multi-view clustering algorithms in document clustering. One challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods widely used two measurements: the Cosine similarity and the Euclidean Distance (ED). The first did not consider the magnitude between the two vectors. The second cannot compute the dissimilarity of two vectors that share the same ED. In this paper, we proposed a multi-view document clustering scheme to overcome these drawbacks by calculating the heterogeneity between documents with the same ED while taking into consideration their magnitudes. The experimental results show that the proposed similarity function can measure the similarity between documents more accurately than the existing metrics, and the proposed document clustering scheme goes beyond the limit of several state-of-the-art algorithms.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Concept-Enhanced Multi-view Clustering of Document Data\",\"authors\":\"Bassoma Diallo, Jie Hu, Tianrui Li, G. Khan, Chunyan Ji\",\"doi\":\"10.1109/ISKE47853.2019.9170436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many works implemented multi-view clustering algorithms in document clustering. One challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods widely used two measurements: the Cosine similarity and the Euclidean Distance (ED). The first did not consider the magnitude between the two vectors. The second cannot compute the dissimilarity of two vectors that share the same ED. In this paper, we proposed a multi-view document clustering scheme to overcome these drawbacks by calculating the heterogeneity between documents with the same ED while taking into consideration their magnitudes. The experimental results show that the proposed similarity function can measure the similarity between documents more accurately than the existing metrics, and the proposed document clustering scheme goes beyond the limit of several state-of-the-art algorithms.\",\"PeriodicalId\":399084,\"journal\":{\"name\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE47853.2019.9170436\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concept-Enhanced Multi-view Clustering of Document Data
Many works implemented multi-view clustering algorithms in document clustering. One challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods widely used two measurements: the Cosine similarity and the Euclidean Distance (ED). The first did not consider the magnitude between the two vectors. The second cannot compute the dissimilarity of two vectors that share the same ED. In this paper, we proposed a multi-view document clustering scheme to overcome these drawbacks by calculating the heterogeneity between documents with the same ED while taking into consideration their magnitudes. The experimental results show that the proposed similarity function can measure the similarity between documents more accurately than the existing metrics, and the proposed document clustering scheme goes beyond the limit of several state-of-the-art algorithms.