Metadata propagation in the Web using co-citations

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05) Pub Date : 2005-09-19 DOI:10.1109/WI.2005.95

Camille Prime-Claverie, M. Beigbeder, T. Lafouge

引用次数: 4

Abstract

Given the large heterogeneity of the World Wide Web, using metadata on the search engines side seems to be a useful track for information retrieval. Though, because a manual qualification at the Web scale is not accessible, this track is little followed. We propose a semi-automatic method for propagating metadata. In a first step, homogeneous corpus are extracted. We used in our study the following properties: the authority type, the site type, the information type, and the page type. This first step is realized by a clusterization which uses a similarity measure based on the co-citation frequency between pages. Given the cluster hierarchy, the second step selects a reduced number of documents to be manually qualified and propagates the given metadata values to the other documents belonging to the same cluster. A qualitative evaluation and a preliminary study about the scalability of this method are presented.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用共引用的元数据在Web中的传播

考虑到万维网的巨大异质性，在搜索引擎端使用元数据似乎是信息检索的有用途径。但是，由于无法获得Web规模的手动鉴定，因此很少有人遵循这条路线。我们提出了一种半自动的元数据传播方法。第一步，提取同构语料库。我们在研究中使用了以下属性:权限类型、站点类型、信息类型和页面类型。第一步是通过基于页面间共引频率的相似性度量的聚类来实现的。给定集群层次结构，第二步选择需要手动限定的较少数量的文档，并将给定的元数据值传播给属于同一集群的其他文档。对该方法的可扩展性进行了定性评价和初步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

自引率

0.00%

发文量

期刊最新文献

Guidance performance indicator - Web metrics for information driven Web sites Categorical term descriptor: a proposed term weighting scheme for feature selection Binary prediction based on weighted sequential mining method Compatibility analysis of Web services Architecture for automated annotation and ontology based querying of semantic Web resources