{"title":"文本聚类算法研究","authors":"Qun Li, Xin-yuan Huang","doi":"10.1109/DBTA.2010.5659055","DOIUrl":null,"url":null,"abstract":"Web documents are enormous. Text clustering is to place the documents with the most words in common into the same cluster. Thus the web search engine can structure the large result set for a certain quest. In this article, we study three kinds of clustering algorithms, prototype based, density based and hierarchical clustering algorithms. We compare two typical algorithms, K-medoids and DBSCAN. The results show that the K-medoids is sensitive to the initial center point and the DBSCAN has a better performance.","PeriodicalId":320509,"journal":{"name":"2010 2nd International Workshop on Database Technology and Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Research on Text Clustering Algorithms\",\"authors\":\"Qun Li, Xin-yuan Huang\",\"doi\":\"10.1109/DBTA.2010.5659055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web documents are enormous. Text clustering is to place the documents with the most words in common into the same cluster. Thus the web search engine can structure the large result set for a certain quest. In this article, we study three kinds of clustering algorithms, prototype based, density based and hierarchical clustering algorithms. We compare two typical algorithms, K-medoids and DBSCAN. The results show that the K-medoids is sensitive to the initial center point and the DBSCAN has a better performance.\",\"PeriodicalId\":320509,\"journal\":{\"name\":\"2010 2nd International Workshop on Database Technology and Applications\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 2nd International Workshop on Database Technology and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DBTA.2010.5659055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 2nd International Workshop on Database Technology and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DBTA.2010.5659055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Web documents are enormous. Text clustering is to place the documents with the most words in common into the same cluster. Thus the web search engine can structure the large result set for a certain quest. In this article, we study three kinds of clustering algorithms, prototype based, density based and hierarchical clustering algorithms. We compare two typical algorithms, K-medoids and DBSCAN. The results show that the K-medoids is sensitive to the initial center point and the DBSCAN has a better performance.