PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering

2009 Second International Symposium on Knowledge Acquisition and Modeling Pub Date : 2009-11-30 DOI:10.1109/KAM.2009.191

Walaa K. Gad, M. Kamel

引用次数: 5

Abstract

In this paper, a novel document representation model the Phrases Semantic Similarity Based Model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于短语语义相似度的文档聚类模型

本文提出了一种新的文档表示模型——基于短语语义相似度的模型。该模型将短语分析和单词分析结合起来，使用WordNet作为背景知识，探索更好的聚类文档表示方法。PH-SSBM为文档单词和短语分配语义权重。新的权重反映了文档术语之间的语义相关性，并捕获了文档中的语义信息。PH-SSBM根据匹配的术语(短语和单词)及其语义权重来查找文档之间的相似性。实验结果表明，基于短语语义相似度的模型(PH-SSBM)与WordNet相结合，对文本聚类具有很好的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2009 Second International Symposium on Knowledge Acquisition and Modeling

自引率

0.00%

发文量

期刊最新文献

Reduction Methods of Attributes Based on Improved BPSO A Novel Program Analysis Method Based on Execution Path Correlation A New Repeat Family Detection Method Based on Sparse de Bruijn Graph Extracting Event Temporal Information Based on Web Application of Ant Colony Algorithm in Discrete Job-Shop Scheduling