首页 > 最新文献

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
Improving QA retrieval using document priors 使用文档先验改进QA检索
J. Mayfield, Paul McNamee
We present a simple way to improve document retrieval for question answering systems. The method biases the retrieval system toward documents that contain words that have appeared in other documents containing answers to the same type of question. The method works with virtually any retrieval system, and exhibits a statistically significant performance improvement over a strong baseline.
我们提出了一种改进问答系统文档检索的简单方法。该方法使检索系统偏向于包含在其他包含相同类型问题答案的文档中出现过的单词的文档。该方法几乎适用于任何检索系统,并且在强基线上显示出统计上显着的性能改进。
{"title":"Improving QA retrieval using document priors","authors":"J. Mayfield, Paul McNamee","doi":"10.1145/1148170.1148313","DOIUrl":"https://doi.org/10.1145/1148170.1148313","url":null,"abstract":"We present a simple way to improve document retrieval for question answering systems. The method biases the retrieval system toward documents that contain words that have appeared in other documents containing answers to the same type of question. The method works with virtually any retrieval system, and exhibits a statistically significant performance improvement over a strong baseline.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using historical data to enhance rank aggregation 使用历史数据增强等级聚合
Miriam Fernández, D. Vallet, P. Castells
Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal.
等级聚合是红外技术中普遍存在的一种操作。我们假设基于分数的聚合的性能可能会受到输入分数分布中持续出现的人为的、通常是无意义的偏差的影响,当个体偏差彼此不同时,这些偏差会扭曲组合结果。我们提出了一个基于分数的排名聚合模型,其中源分数在组合之前被归一化为共同分布。对几个TREC收集的可用数据进行的早期实验表明支持我们的建议。
{"title":"Using historical data to enhance rank aggregation","authors":"Miriam Fernández, D. Vallet, P. Castells","doi":"10.1145/1148170.1148296","DOIUrl":"https://doi.org/10.1145/1148170.1148296","url":null,"abstract":"Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Enhancing topic tracking with temporal information 利用时间信息增强主题跟踪
Baoli Li, Wenjie Li, Q. Lu
In this paper, we propose a new strategy with time granularity reasoning for utilizing temporal information in topic tracking. Compared with previous ones, our work has four distinguished characteristics. Firstly, we try to determine a set of topic times for a target topic from the given on-topic stories. It helps to avoid the negative influence from other irrelevant times. Secondly, we take into account time granularity variance when deciding whether a coreference relationship exists between two times. Thirdly, both publication time and times presented in texts are considered. Finally, as time is only one attribute of a topic, we increase the similarity between a story and a target topic only when they are related not only temporally but also semantically. Experiments on two TDT corpora show that our method makes good use of temporal information in news stories.
在本文中,我们提出了一种新的时间粒度推理策略来利用时间信息进行主题跟踪。与以往的工作相比,我们的工作有四个显著的特点。首先,我们尝试从给定的主题故事中确定目标主题的一组主题时间。这有助于避免其他不相关时间的负面影响。其次,在确定两个时间之间是否存在共引用关系时,我们考虑了时间粒度方差。第三,同时考虑出版时间和在文本中出现的时间。最后,由于时间只是主题的一个属性,只有当故事和目标主题不仅在时间上而且在语义上相关时,我们才能增加它们之间的相似性。在两个TDT语料库上的实验表明,我们的方法可以很好地利用新闻故事中的时间信息。
{"title":"Enhancing topic tracking with temporal information","authors":"Baoli Li, Wenjie Li, Q. Lu","doi":"10.1145/1148170.1148308","DOIUrl":"https://doi.org/10.1145/1148170.1148308","url":null,"abstract":"In this paper, we propose a new strategy with time granularity reasoning for utilizing temporal information in topic tracking. Compared with previous ones, our work has four distinguished characteristics. Firstly, we try to determine a set of topic times for a target topic from the given on-topic stories. It helps to avoid the negative influence from other irrelevant times. Secondly, we take into account time granularity variance when deciding whether a coreference relationship exists between two times. Thirdly, both publication time and times presented in texts are considered. Finally, as time is only one attribute of a topic, we increase the similarity between a story and a target topic only when they are related not only temporally but also semantically. Experiments on two TDT corpora show that our method makes good use of temporal information in news stories.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116186474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Semantic term matching in axiomatic approaches to information retrieval 公理化信息检索方法中的语义项匹配
Hui Fang, ChengXiang Zhai
A common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching of terms in the queries and documents, without allowing distinct but semantically related terms to match each other and contribute to the retrieval score. In this paper, we show that semantic term matching can be naturally incorporated into the axiomatic retrieval model through defining the primitive weighting function based on a semantic similarity function of terms. We define several desirable retrieval constraints for semantic term matching and use such constraints to extend the axiomatic model to directly support semantic term matching based on the mutual information of terms computed on some document set. We show that such extension can be efficiently implemented as query expansion. Experiment results on several representative data sets show that, with mutual information computed over the documents in either the target collection for retrieval or an external collection such as the Web, our semantic expansion consistently and substantially improves retrieval accuracy over the baseline axiomatic retrieval model. As a pseudo feedback method, our method also outperforms a state-of-the-art language modeling feedback method.
许多检索模型(包括最近提出的公理方法)的一个共同限制是,检索分数仅仅基于查询和文档中术语的精确(即语法)匹配,不允许不同但语义相关的术语相互匹配并对检索分数做出贡献。在本文中,我们通过定义基于词的语义相似度函数的原语权重函数,证明语义词匹配可以自然地融入公理检索模型。我们定义了几个语义词匹配所需的检索约束,并利用这些约束对公理模型进行了扩展,使其能够直接支持基于在某个文档集上计算的词的互信息的语义词匹配。我们证明了这种扩展可以有效地实现为查询扩展。在几个代表性数据集上的实验结果表明,在检索的目标集合或外部集合(如Web)中的文档上计算互信息,我们的语义扩展一致并大大提高了基线公理检索模型的检索精度。作为一种伪反馈方法,我们的方法也优于最先进的语言建模反馈方法。
{"title":"Semantic term matching in axiomatic approaches to information retrieval","authors":"Hui Fang, ChengXiang Zhai","doi":"10.1145/1148170.1148193","DOIUrl":"https://doi.org/10.1145/1148170.1148193","url":null,"abstract":"A common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching of terms in the queries and documents, without allowing distinct but semantically related terms to match each other and contribute to the retrieval score. In this paper, we show that semantic term matching can be naturally incorporated into the axiomatic retrieval model through defining the primitive weighting function based on a semantic similarity function of terms. We define several desirable retrieval constraints for semantic term matching and use such constraints to extend the axiomatic model to directly support semantic term matching based on the mutual information of terms computed on some document set. We show that such extension can be efficiently implemented as query expansion. Experiment results on several representative data sets show that, with mutual information computed over the documents in either the target collection for retrieval or an external collection such as the Web, our semantic expansion consistently and substantially improves retrieval accuracy over the baseline axiomatic retrieval model. As a pseudo feedback method, our method also outperforms a state-of-the-art language modeling feedback method.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128511114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 138
A complex document information processing prototype 一个复杂的文档信息处理原型
S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn, E. Voorhees
We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities.
我们开发了一个原型,用于集成检索和聚合扫描纸质文档中包含的各种信息。这种复杂的文档信息处理将多种形式的图像处理与文本/语言处理结合在一起,从而能够有效地分析复杂的文档集合,这是广泛应用的必要条件。这是第一个尝试从复杂文档中进行综合检索的系统;我们报告它目前的能力。
{"title":"A complex document information processing prototype","authors":"S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn, E. Voorhees","doi":"10.1145/1148170.1148274","DOIUrl":"https://doi.org/10.1145/1148170.1148274","url":null,"abstract":"We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
High accuracy retrieval with multiple nested ranker 具有多个嵌套排序器的高精度检索
Irina Matveeva, C. Burges, Timo Burkard, Andy Laucius, Leon Wong
High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.
高检索精度已成为信息检索研究的新热点。本文提出了一种多嵌套排序方法,通过对得分最高的文档进行迭代重新排序,提高了最高排名的准确性。在每次迭代中,该方法使用RankNet学习算法对结果子集进行重新排序。这将问题分解为更小更简单的任务,并生成算法要学习的结果的新分布。我们使用不同的设置来评估这种方法,这些设置在标记了几个相关度的数据集上。我们使用归一化贴现累积增益(NDCG)来衡量性能,因为它不仅取决于位置,还取决于文档在排名列表中的相关性得分。我们的实验表明,使学习算法集中在得分最高的结果上,可以提高NDCG得分前十位文档的精度。
{"title":"High accuracy retrieval with multiple nested ranker","authors":"Irina Matveeva, C. Burges, Timo Burkard, Andy Laucius, Leon Wong","doi":"10.1145/1148170.1148246","DOIUrl":"https://doi.org/10.1145/1148170.1148246","url":null,"abstract":"High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"213 Suppl 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
Building a test collection for complex document information processing 构建用于复杂文档信息处理的测试集合
D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, J. Heard
Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 1.5 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.
由于缺乏具有现实范围和复杂性的公共测试集,纸质扫描文档信息访问技术的研究和发展一直受到阻碍。作为创建用于搜索和挖掘大量文档图像的原型系统项目的一部分,我们正在组装一个1.5 tb的数据集,以支持端到端的复杂文档信息处理(CDIP)任务(例如,文本检索和数据挖掘)以及光学字符识别(OCR)、文档结构分析、签名匹配和作者归属等组件技术的评估。
{"title":"Building a test collection for complex document information processing","authors":"D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, J. Heard","doi":"10.1145/1148170.1148307","DOIUrl":"https://doi.org/10.1145/1148170.1148307","url":null,"abstract":"Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 1.5 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125724499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 230
The effect of OCR errors on stylistic text classification OCR错误对文体文本分类的影响
S. Stein, S. Argamon, O. Frieder
Recently, interest is growing in non-topical text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.
最近,人们对非主题文本分类任务的兴趣越来越大,如体裁分类、情感分析和作者分析。我们研究了OCR误差在多大程度上影响扫描文档的文体分类。我们发现,即使在OCRed文档中出现相对较高的错误,也不会实质性地影响文体分类的准确性。
{"title":"The effect of OCR errors on stylistic text classification","authors":"S. Stein, S. Argamon, O. Frieder","doi":"10.1145/1148170.1148325","DOIUrl":"https://doi.org/10.1145/1148170.1148325","url":null,"abstract":"Recently, interest is growing in non-topical text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132355377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semantic search via XML fragments: a high-precision approach to IR 通过XML片段进行语义搜索:一种高精度的IR方法
Jennifer Chu-Carroll, J. Prager, Krzysztof Czuba, D. Ferrucci, Pablo Duboue
In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.
在一些IR应用程序中,希望采用高精度搜索策略来返回一组高度集中且与用户信息需求相关的文档。考虑到这些应用程序,我们使用XML Fragments查询语言在文本语料库上进行语义搜索,这些语料库是自动预处理的,用于对检索有用的语义信息进行编码。我们确定了可以应用于查询的三个XML Fragment操作,以概念化、限制或关联查询中的术语。我们将演示如何使用这些操作来满足四种不同的查询时语义需求:指定目标信息类型、消除关键字歧义、指定搜索词上下文或关联查询中的选择词。我们通过使用我们嵌入该技术的两个应用程序的一系列实验来证明我们的语义搜索技术的有效性,并表明它在搜索结果的精度方面产生了显着提高。
{"title":"Semantic search via XML fragments: a high-precision approach to IR","authors":"Jennifer Chu-Carroll, J. Prager, Krzysztof Czuba, D. Ferrucci, Pablo Duboue","doi":"10.1145/1148170.1148247","DOIUrl":"https://doi.org/10.1145/1148170.1148247","url":null,"abstract":"In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130050622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Learning user interaction models for predicting web search result preferences 学习用户交互模型,预测网络搜索结果偏好
Eugene Agichtein, Eric Brill, S. Dumais, R. Ragno
Evaluating user preferences of web search results is crucial for search engine development, deployment, and maintenance. We present a real-world study of modeling the behavior of web search users to predict web search result preferences. Accurate modeling and interpretation of user behavior has important applications to ranking, click spam detection, web search personalization, and other tasks. Our key insight to improving robustness of interpreting implicit feedback is to model query-dependent deviations from the expected "noisy" user behavior. We show that our model of clickthrough interpretation improves prediction accuracy over state-of-the-art clickthrough methods. We generalize our approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone. We report results of a large-scale experimental evaluation that show substantial improvements over published implicit feedback interpretation methods.
评估用户对网络搜索结果的偏好对于搜索引擎的开发、部署和维护至关重要。我们提出了一个真实世界的研究建模网络搜索用户的行为,以预测网络搜索结果的偏好。用户行为的准确建模和解释对于排名、点击垃圾检测、网络搜索个性化和其他任务具有重要的应用。我们对提高解释隐式反馈的鲁棒性的关键见解是对与预期的“嘈杂”用户行为相关的查询依赖偏差进行建模。我们表明,我们的点击通过解释模型提高了最先进的点击通过方法的预测精度。我们将我们的方法推广到超越点击的用户行为模型,这比仅基于点击信息的模型具有更高的偏好预测精度。我们报告了一项大规模实验评估的结果,该结果显示了对已发表的隐式反馈解释方法的实质性改进。
{"title":"Learning user interaction models for predicting web search result preferences","authors":"Eugene Agichtein, Eric Brill, S. Dumais, R. Ragno","doi":"10.1145/1148170.1148175","DOIUrl":"https://doi.org/10.1145/1148170.1148175","url":null,"abstract":"Evaluating user preferences of web search results is crucial for search engine development, deployment, and maintenance. We present a real-world study of modeling the behavior of web search users to predict web search result preferences. Accurate modeling and interpretation of user behavior has important applications to ranking, click spam detection, web search personalization, and other tasks. Our key insight to improving robustness of interpreting implicit feedback is to model query-dependent deviations from the expected \"noisy\" user behavior. We show that our model of clickthrough interpretation improves prediction accuracy over state-of-the-art clickthrough methods. We generalize our approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone. We report results of a large-scale experimental evaluation that show substantial improvements over published implicit feedback interpretation methods.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134007456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 571
期刊
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1