{"title":"使用GHSOM和sammon投影的文档集群可视化混合方法","authors":"P. Butka, J. Pócsová","doi":"10.1109/SACI.2013.6608994","DOIUrl":null,"url":null,"abstract":"This paper presents the hybrid approach for visualization of documents sets by the combination of hierarchical clustering method, based on the Growing Hierarchical Self-Organizing Maps algorithm, and Sammon projection. Algorithms based on the self-organizing maps provide robust clustering method suitable for visualization of larger number of documents into the grid-based 2D maps. Sammon projection is nonlinear projection method suitable mostly to visualization of smaller sets of object on (usually 2D) maps based on the projections. Here we have implemented and tested combination of these approaches, where starting set of documents is organized using GHSOM to subsets of similar documents, then for clusters at the end of clustering phase, with smaller number of inputs, Sammon maps are created in order to provide distinction also for documents in these clusters. The method for extraction of characteristic terms based on the information gain analysis was used for description of clusters. Existing library JBOWL was used for implementation of the hybrid algorithm. For testing purposes, the documents in English language were used.","PeriodicalId":304729,"journal":{"name":"2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Hybrid approach for visualization of documents clusters using GHSOM and sammon projection\",\"authors\":\"P. Butka, J. Pócsová\",\"doi\":\"10.1109/SACI.2013.6608994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the hybrid approach for visualization of documents sets by the combination of hierarchical clustering method, based on the Growing Hierarchical Self-Organizing Maps algorithm, and Sammon projection. Algorithms based on the self-organizing maps provide robust clustering method suitable for visualization of larger number of documents into the grid-based 2D maps. Sammon projection is nonlinear projection method suitable mostly to visualization of smaller sets of object on (usually 2D) maps based on the projections. Here we have implemented and tested combination of these approaches, where starting set of documents is organized using GHSOM to subsets of similar documents, then for clusters at the end of clustering phase, with smaller number of inputs, Sammon maps are created in order to provide distinction also for documents in these clusters. The method for extraction of characteristic terms based on the information gain analysis was used for description of clusters. Existing library JBOWL was used for implementation of the hybrid algorithm. For testing purposes, the documents in English language were used.\",\"PeriodicalId\":304729,\"journal\":{\"name\":\"2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SACI.2013.6608994\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SACI.2013.6608994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid approach for visualization of documents clusters using GHSOM and sammon projection
This paper presents the hybrid approach for visualization of documents sets by the combination of hierarchical clustering method, based on the Growing Hierarchical Self-Organizing Maps algorithm, and Sammon projection. Algorithms based on the self-organizing maps provide robust clustering method suitable for visualization of larger number of documents into the grid-based 2D maps. Sammon projection is nonlinear projection method suitable mostly to visualization of smaller sets of object on (usually 2D) maps based on the projections. Here we have implemented and tested combination of these approaches, where starting set of documents is organized using GHSOM to subsets of similar documents, then for clusters at the end of clustering phase, with smaller number of inputs, Sammon maps are created in order to provide distinction also for documents in these clusters. The method for extraction of characteristic terms based on the information gain analysis was used for description of clusters. Existing library JBOWL was used for implementation of the hybrid algorithm. For testing purposes, the documents in English language were used.