{"title":"一种新的标记增强LDA聚类模型","authors":"Yi Zhao, Yu Qiao, K. He","doi":"10.4018/IJWSR.2019070104","DOIUrl":null,"url":null,"abstract":"Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.","PeriodicalId":54936,"journal":{"name":"International Journal of Web Services Research","volume":"35 1","pages":"59-77"},"PeriodicalIF":0.8000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Novel Tagging Augmented LDA Model for Clustering\",\"authors\":\"Yi Zhao, Yu Qiao, K. He\",\"doi\":\"10.4018/IJWSR.2019070104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.\",\"PeriodicalId\":54936,\"journal\":{\"name\":\"International Journal of Web Services Research\",\"volume\":\"35 1\",\"pages\":\"59-77\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Web Services Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.4018/IJWSR.2019070104\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Web Services Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.4018/IJWSR.2019070104","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A Novel Tagging Augmented LDA Model for Clustering
Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.
期刊介绍:
The International Journal of Web Services Research (IJWSR) is the first refereed, international publication featuring the latest research findings and industry solutions involving all aspects of Web services technology. This journal covers advancements, standards, and practices of Web services, as well as identifies emerging research topics and defines the future of Web services on grid computing, multimedia, and communication. IJWSR provides an open, formal publication for high quality articles developed by theoreticians, educators, developers, researchers, and practitioners for those desiring to stay abreast of challenges in Web services technology.