{"title":"PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering","authors":"Walaa K. Gad, M. Kamel","doi":"10.1109/KAM.2009.191","DOIUrl":null,"url":null,"abstract":"In this paper, a novel document representation model the Phrases Semantic Similarity Based Model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.","PeriodicalId":192986,"journal":{"name":"2009 Second International Symposium on Knowledge Acquisition and Modeling","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second International Symposium on Knowledge Acquisition and Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KAM.2009.191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In this paper, a novel document representation model the Phrases Semantic Similarity Based Model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.