首页 > 最新文献

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
User expectations from XML element retrieval 用户对XML元素检索的期望
Stamatina Betsi, M. Lalmas, A. Tombros, T. Tsikrika
The primary aim of XML element retrieval is to return to users XML elements, rather than whole documents. This poster describes a small study, in which we elicited users' expectations, i.e. their anticipated experience, when interacting with an XML retrieval system, as compared to a traditional 'flat' document retrieval system.
XML元素检索的主要目的是向用户返回XML元素,而不是整个文档。这张海报描述了一个小的研究,在这个研究中,我们引出了用户的期望,即他们期望的体验,当与XML检索系统交互时,将其与传统的“平面”文档检索系统进行比较。
{"title":"User expectations from XML element retrieval","authors":"Stamatina Betsi, M. Lalmas, A. Tombros, T. Tsikrika","doi":"10.1145/1148170.1148280","DOIUrl":"https://doi.org/10.1145/1148170.1148280","url":null,"abstract":"The primary aim of XML element retrieval is to return to users XML elements, rather than whole documents. This poster describes a small study, in which we elicited users' expectations, i.e. their anticipated experience, when interacting with an XML retrieval system, as compared to a traditional 'flat' document retrieval system.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129866820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Dynamic test collections: measuring search effectiveness on the live web 动态测试集合:在实时网络上测量搜索效果
I. Soboroff
Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of "collection maintenance" which supports measuring search performance both for a single system and between systems run at different points in time.
现有的测量搜索算法质量的方法使用文档的静态集合。一组查询和从查询到相关文档的映射允许实验人员查看不同的搜索引擎或引擎配置检索正确答案的效果。该方法假定文档集以及相关文档集不变。在本文中,我们放弃了静态收集需求。我们从最近通过网络抓取创建的TREC集合开始,并分析该集合中的文档是如何随时间变化的。我们确定了文档集合的衰减如何影响TREC系统,并给出了使用衰减集合测量实时web搜索系统的实验结果。我们采用了新颖的搜索有效性度量,尽管不完整的相关信息仍然具有鲁棒性。最后,我们提出了一种“集合维护”方法,该方法支持测量单个系统和在不同时间点运行的系统之间的搜索性能。
{"title":"Dynamic test collections: measuring search effectiveness on the live web","authors":"I. Soboroff","doi":"10.1145/1148170.1148220","DOIUrl":"https://doi.org/10.1145/1148170.1148220","url":null,"abstract":"Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of \"collection maintenance\" which supports measuring search performance both for a single system and between systems run at different points in time.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128358095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Using small XML elements to support relevance 使用小的XML元素来支持相关性
G. Ramírez, T. Westerveld, A. D. Vries
Small XML elements are often estimated relevant by the retrieval model but they are not desirable retrieval units. This paper presents a generic model that exploits the information obtained from small elements. We identify relationships between small and relevant elements and use this linking information to reinforce the relevance of other elements before removing the small ones. Our experiments using the INEX testbed show the effectiveness of our approach.
小的XML元素通常被检索模型估计为相关的,但它们不是理想的检索单元。本文提出了一个利用从小元素中获得的信息的通用模型。我们识别小元素和相关元素之间的关系,并在删除小元素之前使用此链接信息来加强其他元素的相关性。我们在INEX测试平台上的实验表明了我们方法的有效性。
{"title":"Using small XML elements to support relevance","authors":"G. Ramírez, T. Westerveld, A. D. Vries","doi":"10.1145/1148170.1148321","DOIUrl":"https://doi.org/10.1145/1148170.1148321","url":null,"abstract":"Small XML elements are often estimated relevant by the retrieval model but they are not desirable retrieval units. This paper presents a generic model that exploits the information obtained from small elements. We identify relationships between small and relevant elements and use this linking information to reinforce the relevance of other elements before removing the small ones. Our experiments using the INEX testbed show the effectiveness of our approach.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129323452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Social networks, incentives, and search 社交网络,激励和搜索
J. Kleinberg
The role of network structure has grown in significance over the past ten years in the field of information retrieval, stimulated to a great extent by the importance of link analysis in the development of Web search techniques [4]. This body of work has focused primarily on the network that is most clearly visible on the Web: the network of hyperlinks connecting documents to documents. But the Web has always contained a second network, less explicit but equally important, and this is the social network on its users, with latent person-to-person links encoding a variety of relationships including friendship, information exchange, and influence. Developments over the past few years --- including the emergence of social networking systems and rich social media, as well as the availability of large-scale e-mail and instant messenging datasets --- have highlighted the crucial role played by on-line social networks, and at the same time have made them much easier to uncover and analyze. There is now a considerable opportunity to exploit the information content inherent in these networks, and this prospect raises a number of interesting research challenge.Within this context, we focus on some recent efforts to formalize the problem of searching a social network. The goal is to capture the issues underlying a variety of related scenarios: a member of a social networking system such as MySpace seeks a piece of information that may be held by a friend of a friend [27, 28]; an employee in a large company searches his or her network of colleagues for expertise in a particular subject [9]; a node in a decentralized peer-to-peer file-sharing system queries for a file that is likely to be a small number of hops away [2, 6, 16, 17]; or a user in a distributed IR or federated search setting traverses a network of distributed resources connected by links that may not just be informational but also economic or contractual [3, 5, 7, 8, 13, 18, 21]. In their most basic forms, these scenarios have some essential features in common: a node in a network, without global knowledge, must find a short path to a desired "target" node (or to one of several possible target nodes).To frame the underlying problem, we go back to one of the most well-known pieces of empirical social network analysis --- Stanley Milgram's research into the small-world phenomenon, also known as the "six degrees of separation" [19, 24, 25]. The form of Milgram's experiments, in which randomly chosen starters had to forward a letter to a designated target individual, established not just that short chains connecting far-flung pairs of people are abundant in large social networks, but also that the individuals in these networks, operating with purely local information about their own friends and acquaintances, are able to actually find these chains [10]. The Milgram experiments thus constituted perhaps the earliest indication that large-scale social networks are structured to support this type of decen
在过去的十年中,网络结构在信息检索领域的作用越来越重要,这在很大程度上是由于链接分析在Web搜索技术发展中的重要性[4]。这项工作主要集中在Web上最清晰可见的网络:连接文档到文档的超链接网络。但是网络一直包含着第二个网络,虽然不太明确,但同样重要,这就是用户的社交网络,潜在的人与人之间的链接编码着各种关系,包括友谊、信息交换和影响。过去几年的发展——包括社交网络系统和丰富的社交媒体的出现,以及大规模电子邮件和即时通讯数据集的可用性——突出了在线社交网络所起的关键作用,同时也使它们更容易被发现和分析。现在有一个相当大的机会来利用这些网络中固有的信息内容,这一前景提出了许多有趣的研究挑战。在此背景下,我们将重点关注最近为形式化搜索社交网络问题所做的一些努力。目标是捕捉各种相关场景的潜在问题:社交网络系统(如MySpace)的成员寻找一条可能由朋友的朋友持有的信息[27,28];大公司的员工在自己的同事网络中搜索某一特定领域的专业知识[9];在去中心化的点对点文件共享系统中,一个节点查询的文件可能距离[2,6,16,17]很短;或者在分布式IR或联邦搜索设置中的用户遍历由链接连接的分布式资源网络,这些链接不仅是信息,而且是经济或合同[3,5,7,8,13,18,21]。在它们最基本的形式中,这些场景有一些基本的共同特征:网络中的一个节点,没有全局知识,必须找到一条到期望的“目标”节点(或几个可能的目标节点之一)的短路径。为了构建潜在的问题,我们回到最著名的实证社会网络分析之一——斯坦利·米尔格拉姆(Stanley Milgram)对小世界现象的研究,也被称为“六度分离”[19,24,25]。在米尔格拉姆的实验中,随机选择的起始者必须将一封信转发给指定的目标个体,这种实验形式不仅证明了在大型社会网络中,连接遥远人群的短链是丰富的,而且证明了这些网络中的个体,在使用他们自己的朋友和熟人的纯粹本地信息时,能够真正找到这些链[10]。因此,米尔格拉姆的实验可能是最早的迹象,表明大规模的社交网络是为了支持这种分散的搜索而构建的。在Watts和Strogatz[26]提出的一系列随机图模型中,我们已经表明,网络支持这种分散搜索的能力在微妙的方面取决于其“远程”连接如何与嵌入其中的底层空间或组织结构相关联[10,11]。最近的研究使用了组织内部的交流数据[1]和大型在线社区中的友谊[15],发现了一个惊人的事实,即真实的社交网络与这些数学模型预测的一些结构特征密切匹配。如果人们进一步观察为这些问题提供最初动机的在线设置,就会发现它们的长期经济影响显然从许多方向引起了人们的兴趣——从本质上讲,将分布式信息检索应用程序、点对点系统或社会网络站点视为提供信息和服务的市场所带来的后果。当参与者不仅仅是遵循固定算法的代理,而是根据自身利益做出决策的战略参与者,并且可能因参与协议而要求补偿时,网络中去中心化搜索的问题将如何变化?这样的考虑将我们带入了算法博弈论的领域,这是当前研究的一个活跃领域,它使用博弈论概念来量化参与者遵循自身利益的系统的性能[20,23]。在一个存在激励的分散搜索的简单模型中,我们发现性能主要取决于信息的稀缺性和网络拓扑的丰富性[12]——如果网络结构过于贫乏,产生从查询到答案的路径可能需要巨大的投资。
{"title":"Social networks, incentives, and search","authors":"J. Kleinberg","doi":"10.1145/1148170.1148172","DOIUrl":"https://doi.org/10.1145/1148170.1148172","url":null,"abstract":"The role of network structure has grown in significance over the past ten years in the field of information retrieval, stimulated to a great extent by the importance of link analysis in the development of Web search techniques [4]. This body of work has focused primarily on the network that is most clearly visible on the Web: the network of hyperlinks connecting documents to documents. But the Web has always contained a second network, less explicit but equally important, and this is the social network on its users, with latent person-to-person links encoding a variety of relationships including friendship, information exchange, and influence. Developments over the past few years --- including the emergence of social networking systems and rich social media, as well as the availability of large-scale e-mail and instant messenging datasets --- have highlighted the crucial role played by on-line social networks, and at the same time have made them much easier to uncover and analyze. There is now a considerable opportunity to exploit the information content inherent in these networks, and this prospect raises a number of interesting research challenge.Within this context, we focus on some recent efforts to formalize the problem of searching a social network. The goal is to capture the issues underlying a variety of related scenarios: a member of a social networking system such as MySpace seeks a piece of information that may be held by a friend of a friend [27, 28]; an employee in a large company searches his or her network of colleagues for expertise in a particular subject [9]; a node in a decentralized peer-to-peer file-sharing system queries for a file that is likely to be a small number of hops away [2, 6, 16, 17]; or a user in a distributed IR or federated search setting traverses a network of distributed resources connected by links that may not just be informational but also economic or contractual [3, 5, 7, 8, 13, 18, 21]. In their most basic forms, these scenarios have some essential features in common: a node in a network, without global knowledge, must find a short path to a desired \"target\" node (or to one of several possible target nodes).To frame the underlying problem, we go back to one of the most well-known pieces of empirical social network analysis --- Stanley Milgram's research into the small-world phenomenon, also known as the \"six degrees of separation\" [19, 24, 25]. The form of Milgram's experiments, in which randomly chosen starters had to forward a letter to a designated target individual, established not just that short chains connecting far-flung pairs of people are abundant in large social networks, but also that the individuals in these networks, operating with purely local information about their own friends and acquaintances, are able to actually find these chains [10]. The Milgram experiments thus constituted perhaps the earliest indication that large-scale social networks are structured to support this type of decen","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132467523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Exploring the limits of single-iteration clarification dialogs 探索单次迭代澄清对话框的局限性
Jimmy J. Lin, Philip Wu, Dina Demner-Fushman, E. Abels
Single-iteration clarification dialogs, as implemented in the TREC HARD track, represent an attempt to introduce interaction into ad hoc retrieval, while preserving the many benefits of large-scale evaluations. Although previous experiments have not conclusively demonstrated performance gains resulting from such interactions, it is unclear whether these findings speak to the nature of clarification dialogs, or simply the limitations of current systems. To probe the limits of such interactions, we employed a human intermediary to formulate clarification questions and exploit user responses. In addition to establishing a plausible upper bound on performance, we were also able to induce an "ontology of clarifications" to characterize human behavior. This ontology, in turn, serves as the input to a regression model that attempts to determine which types of clarification questions are most helpful. Our work can serve to inform the design of interactive systems that initiate user dialogs.
在TREC HARD轨道中实现的单迭代澄清对话代表了将交互引入特别检索的尝试,同时保留了大规模评估的许多好处。虽然以前的实验并没有最终证明这种相互作用带来的性能提升,但尚不清楚这些发现是否说明了澄清对话的本质,还是仅仅说明了当前系统的局限性。为了探索这种互动的局限性,我们雇佣了一个人类中介来制定澄清问题并利用用户的反应。除了建立一个貌似合理的性能上限之外,我们还能够诱导一个“澄清本体”来表征人类行为。反过来,这个本体充当回归模型的输入,该模型试图确定哪种类型的澄清问题最有帮助。我们的工作可以为启动用户对话的交互系统的设计提供信息。
{"title":"Exploring the limits of single-iteration clarification dialogs","authors":"Jimmy J. Lin, Philip Wu, Dina Demner-Fushman, E. Abels","doi":"10.1145/1148170.1148251","DOIUrl":"https://doi.org/10.1145/1148170.1148251","url":null,"abstract":"Single-iteration clarification dialogs, as implemented in the TREC HARD track, represent an attempt to introduce interaction into ad hoc retrieval, while preserving the many benefits of large-scale evaluations. Although previous experiments have not conclusively demonstrated performance gains resulting from such interactions, it is unclear whether these findings speak to the nature of clarification dialogs, or simply the limitations of current systems. To probe the limits of such interactions, we employed a human intermediary to formulate clarification questions and exploit user responses. In addition to establishing a plausible upper bound on performance, we were also able to induce an \"ontology of clarifications\" to characterize human behavior. This ontology, in turn, serves as the input to a regression model that attempts to determine which types of clarification questions are most helpful. Our work can serve to inform the design of interactive systems that initiate user dialogs.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122329044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A graph-based framework for relation propagation and its application to multi-label learning 基于图的关系传播框架及其在多标签学习中的应用
Ming Wu, Rong Jin
Label propagation exploits the structure of the unlabeled documents by propagating the label information of the training documents to the unlabeled documents. The limitation with the existing label propagation approaches is that they can only deal with a single type of objects. We propose a framework, named "relation propagation", that allows for information propagated among multiple types of objects. Empirical studies with multi-label text categorization showed that the proposed algorithm is more effective than several semi-supervised learning algorithms in that it is capable of exploring the correlation among different categories and the structure of unlabeled documents simultaneously.
标签传播通过将训练文档的标签信息传播到未标记文档中来利用未标记文档的结构。现有标签传播方法的局限性在于它们只能处理单一类型的对象。我们提出了一个名为“关系传播”的框架,它允许信息在多种类型的对象之间传播。对多标签文本分类的实证研究表明,该算法能够同时探索不同类别之间的相关性和未标记文档的结构,比几种半监督学习算法更有效。
{"title":"A graph-based framework for relation propagation and its application to multi-label learning","authors":"Ming Wu, Rong Jin","doi":"10.1145/1148170.1148333","DOIUrl":"https://doi.org/10.1145/1148170.1148333","url":null,"abstract":"Label propagation exploits the structure of the unlabeled documents by propagating the label information of the training documents to the unlabeled documents. The limitation with the existing label propagation approaches is that they can only deal with a single type of objects. We propose a framework, named \"relation propagation\", that allows for information propagated among multiple types of objects. Empirical studies with multi-label text categorization showed that the proposed algorithm is more effective than several semi-supervised learning algorithms in that it is capable of exploring the correlation among different categories and the structure of unlabeled documents simultaneously.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126096596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Comparing two blind relevance feedback techniques 比较两种盲相关反馈技术
Daqing He, Yefei Peng
Query expansion based on Blind Relevance Feedback (BRF) has been demonstrated to be an effective technique for improving retrieval results. There are two types of BRF-based query expansion. BRF Type 1 (BRFT1) is the original version of BRF, where query expansion is performed on the BRF information extracted from top N documents selected from an initial search on the same collection that the target documents are in [1]. This collection is called “target collection” in this paper. BRF Type 2 (BRFT2) has been explored as an alternative to BRFT1. The query expansion is performed based on the BRF information of the top N documents selected from the initial search on a DIFFERENT collection. Such a collection is called “expansion collection” in this paper. The expanded query is then used to search on the target collection to find the relevant documents. The effectiveness of BRF depends on two key factors: 1) the documents selected from the initial search for BRF should contain reasonable number of topically relevant documents to the query; and 2) those selected documents should share the similar genre with the target relevant documents so that there is high chance that the important content terms used in these two sets of documents are the same[2]. Both BRFT1 and BRFT2 may encounter situations that at least one of the two conditions cannot be satisfied. For example, there are not enough truly relevant documents in the target collection for many topics in Robust track of TREC evaluation, which makes it difficult to utilize BRFT1 based query expansion techniques to improve the search results. However, with the amount of electronic resources available, it is often possible that both BRFT1 and BRFT2 can
基于盲关联反馈(BRF)的查询扩展已被证明是提高检索结果的有效技术。基于brf的查询扩展有两种类型。BRF Type 1 (BRFT1)是BRF的原始版本,在与目标文档所在的同一集合上进行初始搜索,选择从top N个文档中提取的BRF信息进行查询扩展[1]。本文将此集合称为“目标集合”。BRF 2型(BRFT2)已被探索作为BRFT1的替代品。查询扩展是基于从对不同集合的初始搜索中选择的前N个文档的BRF信息执行的。本文将这种集合称为“扩展集合”。然后使用扩展查询在目标集合上进行搜索,以查找相关文档。BRF的有效性取决于两个关键因素:1)从BRF初始搜索中选择的文档应包含与查询主题相关的合理数量的文档;2)所选文档应与目标相关文档具有相似的体裁,这样两组文档中使用的重要内容术语有很大可能相同[2]。BRFT1和BRFT2都可能遇到两个条件中至少有一个不能满足的情况。例如,在TREC评估的鲁棒跟踪中,许多主题在目标集合中没有足够的真正相关的文档,这使得利用基于BRFT1的查询扩展技术来改进搜索结果变得困难。然而,随着可用电子资源的数量,BRFT1和BRFT2通常都可以
{"title":"Comparing two blind relevance feedback techniques","authors":"Daqing He, Yefei Peng","doi":"10.1145/1148170.1148299","DOIUrl":"https://doi.org/10.1145/1148170.1148299","url":null,"abstract":"Query expansion based on Blind Relevance Feedback (BRF) has been demonstrated to be an effective technique for improving retrieval results. There are two types of BRF-based query expansion. BRF Type 1 (BRFT1) is the original version of BRF, where query expansion is performed on the BRF information extracted from top N documents selected from an initial search on the same collection that the target documents are in [1]. This collection is called “target collection” in this paper. BRF Type 2 (BRFT2) has been explored as an alternative to BRFT1. The query expansion is performed based on the BRF information of the top N documents selected from the initial search on a DIFFERENT collection. Such a collection is called “expansion collection” in this paper. The expanded query is then used to search on the target collection to find the relevant documents. The effectiveness of BRF depends on two key factors: 1) the documents selected from the initial search for BRF should contain reasonable number of topically relevant documents to the query; and 2) those selected documents should share the similar genre with the target relevant documents so that there is high chance that the important content terms used in these two sets of documents are the same[2]. Both BRFT1 and BRFT2 may encounter situations that at least one of the two conditions cannot be satisfied. For example, there are not enough truly relevant documents in the target collection for many topics in Robust track of TREC evaluation, which makes it difficult to utilize BRFT1 based query expansion techniques to improve the search results. However, with the amount of electronic resources available, it is often possible that both BRFT1 and BRFT2 can","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116623371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Probabilistic latent query analysis for combining multiple retrieval sources 组合多检索源的概率潜在查询分析
Rong Yan, Alexander Hauptmann
Combining the output from multiple retrieval sources over the same document collection is of great importance to a number of retrieval tasks such as multimedia retrieval, web retrieval and meta-search. To merge retrieval sources adaptively according to query topics, we propose a series of new approaches called probabilistic latent query analysis (pLQA), which can associate non-identical combination weights with latent classes underlying the query space. Compared with previous query independent and query-class based combination methods, the proposed approaches have the advantage of being able to discover latent query classes automatically without using prior human knowledge, to assign one query to a mixture of query classes, and to determine the number of query classes under a model selection principle. Experimental results on two retrieval tasks, i.e., multimedia retrieval and meta-search, demonstrate that the proposed methods can uncover sensible latent classes from training data, and can achieve considerable performance gains.
将来自多个检索源的输出组合在同一文档集合上对于许多检索任务(如多媒体检索、web检索和元搜索)非常重要。为了根据查询主题自适应地合并检索源,我们提出了一系列新的方法,称为概率潜在查询分析(pLQA),该方法可以将不相同的组合权值与查询空间底层的潜在类关联起来。与以往的独立查询和基于查询类的组合方法相比,本文提出的方法具有无需使用人类先验知识就能自动发现潜在查询类、将一个查询分配给混合查询类、在模型选择原则下确定查询类数量等优点。在多媒体检索和元搜索两个检索任务上的实验结果表明,本文提出的方法可以从训练数据中发现有意义的潜在类,并取得了可观的性能提升。
{"title":"Probabilistic latent query analysis for combining multiple retrieval sources","authors":"Rong Yan, Alexander Hauptmann","doi":"10.1145/1148170.1148228","DOIUrl":"https://doi.org/10.1145/1148170.1148228","url":null,"abstract":"Combining the output from multiple retrieval sources over the same document collection is of great importance to a number of retrieval tasks such as multimedia retrieval, web retrieval and meta-search. To merge retrieval sources adaptively according to query topics, we propose a series of new approaches called probabilistic latent query analysis (pLQA), which can associate non-identical combination weights with latent classes underlying the query space. Compared with previous query independent and query-class based combination methods, the proposed approaches have the advantage of being able to discover latent query classes automatically without using prior human knowledge, to assign one query to a mixture of query classes, and to determine the number of query classes under a model selection principle. Experimental results on two retrieval tasks, i.e., multimedia retrieval and meta-search, demonstrate that the proposed methods can uncover sensible latent classes from training data, and can achieve considerable performance gains.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121749054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
History repeats itself: repeat queries in Yahoo's logs 历史会重演:雅虎的日志中会出现重复的查询
J. Teevan, Eytan Adar, R. Jones, M. A. S. Potts
Thanks to the ubiquity of the Internet search engine search box, users have come to depend on search engines both to find and re-find information. However, re-finding behavior has not been significantly addressed. Here we look at re-finding queries issued to the Yahoo! search engine by 114 users over a year.
由于无处不在的互联网搜索引擎搜索框,用户已经开始依赖搜索引擎来查找和重新查找信息。然而,重新查找行为并没有得到显著的解决。在这里,我们来看看重新查找向Yahoo!搜索引擎在一年内被114个用户使用。
{"title":"History repeats itself: repeat queries in Yahoo's logs","authors":"J. Teevan, Eytan Adar, R. Jones, M. A. S. Potts","doi":"10.1145/1148170.1148326","DOIUrl":"https://doi.org/10.1145/1148170.1148326","url":null,"abstract":"Thanks to the ubiquity of the Internet search engine search box, users have come to depend on search engines both to find and re-find information. However, re-finding behavior has not been significantly addressed. Here we look at re-finding queries issued to the Yahoo! search engine by 114 users over a year.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Statistical precision of information retrieval evaluation 信息检索评价的统计精度
G. Cormack, T. Lynam
We introduce and validate bootstrap techniques to compute confidence intervals that quantify the effect of test-collection variability on average precision (AP) and mean average precision (MAP) IR effectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an infinite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques.
我们引入并验证了自举技术来计算置信区间,以量化测试集可变性对平均精度(AP)和平均平均精度(MAP) IR有效性度量的影响。我们认为IR评估中的测试集合是材料相似集合的总体代表,这些集合的文档是从具有相似特征的无限池中提取的。我们的模型准确地预测了随机选择的TREC-6临时语料库中系统结果之间的一致性程度。我们提出了一个统计评估框架,该框架使用相同的一般框架来模拟其他机会变化来源,作为元分析技术的输入来源。
{"title":"Statistical precision of information retrieval evaluation","authors":"G. Cormack, T. Lynam","doi":"10.1145/1148170.1148262","DOIUrl":"https://doi.org/10.1145/1148170.1148262","url":null,"abstract":"We introduce and validate bootstrap techniques to compute confidence intervals that quantify the effect of test-collection variability on average precision (AP) and mean average precision (MAP) IR effectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an infinite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128039663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
期刊
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1