Applying Semantic Suffix Net to suffix tree clustering

Jongkol Janruang, S. Guha
{"title":"Applying Semantic Suffix Net to suffix tree clustering","authors":"Jongkol Janruang, S. Guha","doi":"10.1109/DMO.2011.5976519","DOIUrl":null,"url":null,"abstract":"In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2011.5976519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语义后缀网在后缀树聚类中的应用
本文研究了搜索引擎返回的片段聚类问题。我们提出了一种在聚类过程中调用语义相似度的技术。我们的技术改进了著名的STC方法,这是一种高效的启发式聚类web搜索结果。然而,STC的一个缺点是不能聚类语义相似的文档。为了解决这个问题,我们提出了一种新的数据结构来表示单个字符串的后缀,称为语义后缀网(SSN)。通过使用一个新的运算符对网络进行部分组合,创建了一个广义语义后缀网络来表示一组字符串的后缀。该算子的一个关键特征是利用语义相似度和字符串匹配来寻找结合点;网对组合就从这个连接点开始。这种逻辑导致广义语义后缀网络的节点和分支数量减少。然后,操作员使用后缀链接线作为分隔网的边界。然后在STC算法中加入广义语义后缀网络,使其能够聚类语义相似的片段。实验结果表明,该算法比传统的STC算法有很大的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparison of various Wiener model identification approach in modelling nonlinear process Data mining technique for expertise search in a special interest group knowledge portal A frequent keyword-set based algorithm for topic modeling and clustering of research papers Optimisation model of selective cutting for Timber Harvest Planning in Peninsular Malaysia Reducing network intrusion detection association rules using Chi-Squared pruning technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1