Building enriched web page representations using link paths

Tim Weninger, ChengXiang Zhai, Jiawei Han
{"title":"Building enriched web page representations using link paths","authors":"Tim Weninger, ChengXiang Zhai, Jiawei Han","doi":"10.1145/2309996.2310006","DOIUrl":null,"url":null,"abstract":"Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks, and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.","PeriodicalId":91270,"journal":{"name":"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media","volume":"7 1","pages":"53-62"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2309996.2310006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks, and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用链接路径构建丰富的网页表示
锚文本具有丰富万维网内各种任务的文档的历史。锚文本很有用,因为它们类似于典型的Web查询,而且它们表达了文档的上下文。因此,对于Web搜索引擎来说,将输入的锚文本合并到文档的标准文本表示中是一种常见的做法。但是,这种方法对于链接很少的文档是不够的,而且它没有包含文档的完整上下文。为了解决这些问题,我们使用了链接路径,它包含了以问题文档结尾的Web路径中的锚文本。我们提出并研究了几种不同的从链接路径聚合锚文本的方法,并表明链接路径的信息可用于(1)改进特定站点搜索中的已知项目搜索,以及(2)将网页映射到数据库记录。我们在几个真实世界的测试集合上严格地评估了我们提出的方法。我们发现,在这两个任务中,我们的方法比基线和现有技术显著提高了性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HT '22: 33rd ACM Conference on Hypertext and Social Media, Barcelona, Spain, 28 June 2022- 1 July 2022 HT '21: 32nd ACM Conference on Hypertext and Social Media, Virtual Event, Ireland, 30 August 2021 - 2 September 2021 HT '20: 31st ACM Conference on Hypertext and Social Media, Virtual Event, USA, July 13-15, 2020 Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides. QualityRank: assessing quality of wikipedia articles by mutually evaluating editors and texts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1