Paraphrasing invariance coefficient: measuring para-query invariance of search engines

T. Imielinski, Jinyun Yan, Yihan Fang, Kurt Eldridge, Huiwen Yu, Peter Kelly
{"title":"Paraphrasing invariance coefficient: measuring para-query invariance of search engines","authors":"T. Imielinski, Jinyun Yan, Yihan Fang, Kurt Eldridge, Huiwen Yu, Peter Kelly","doi":"10.1145/1863879.1863880","DOIUrl":null,"url":null,"abstract":"Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results.\n In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game \"Rephraser\" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries.\n The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.","PeriodicalId":239913,"journal":{"name":"SEMSEARCH '10","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SEMSEARCH '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1863879.1863880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results. In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game "Rephraser" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries. The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
释义不变性系数:衡量搜索引擎的准查询不变性
释义是对文本的重述(或重复使用),以另一种形式保留其含义。para-query是搜索查询的para-phrase。人类很容易识别准查询,但搜索引擎离它还很远。我们声称,为了使搜索引擎被称为语义搜索引擎,它必须通过对给定查询的所有准查询返回相同的搜索结果来识别准查询。识别准查询是搜索引擎的一项重要且需要的功能。它可以减轻用户改写查询的负担,以提高结果的相关性。在本文中,我们讨论了两个主要的线程:单语辅助查询生成(PG)和辅助查询识别测量(PRM)。Para-query generation旨在为给定查询自动生成尽可能多的英语Para-query。我们提出了一个新颖的游戏“Rephraser”来解决这个问题。从游戏的输出中提取了数百个准查询模板,并用于组成数万个准查询。准查询识别度量的目标是检查搜索引擎识别准查询的级别。我们提出了释义不变性系数(PIC)的概念,将其定义为一对准查询的搜索结果相同的概率。通过使用从游戏中生成的准查询,我们设计了测量搜索引擎PIC的实验。结果表明,当今领先的搜索引擎在识别类查询方面仍然不如人类的能力。搜索要真正实现语义化还有很长的路要走。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using BM25F for semantic search Dear search engine: what's your opinion about...?: sentiment analysis for semantic enrichment of web search results The wisdom in tweetonomies: acquiring latent conceptual structures from social awareness streams Paraphrasing invariance coefficient: measuring para-query invariance of search engines A large-scale system for annotating and querying quotations in news feeds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1