An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Pub Date : 2012-12-04 DOI:10.1109/WI-IAT.2012.239

Tarique Anwar, M. Abulaish

引用次数: 6

Abstract

In this paper, we propose a Markov Clustering (MCL) based text mining approach for namesake disambiguation on the Web. The novelty of the proposed technique lies in modeling the collection of web pages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the web pages related to a particular namesake individual. The proposed method focuses on three broad and realistic aspects to cluster web pages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. The efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于mcl的网络同名消歧文本挖掘方法

本文提出了一种基于马尔可夫聚类(MCL)的文本挖掘方法，用于Web上的同名消歧。该技术的新颖之处在于使用加权图结构对网页集合进行建模，并应用MCL将其结晶为不同的聚类，每个聚类包含与特定同名个体相关的网页。该方法从内容重叠、结构重叠和局部上下文重叠三个广泛而现实的方面对搜索引擎检索到的网页进行聚类。通过对标准数据集的实验评估，证明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology

自引率

0.00%

发文量

期刊最新文献

Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method Keyword Proximity Search over Large and Complex RDF Database Cognitive-Educational Constraints for Socially-Relevant MALL Technologies Mining Criminal Networks from Chat Log Inferring User Context from Spatio-Temporal Pattern Mining for Mobile Application Services