首页 > 最新文献

Proceedings of the 2015 International Conference on The Theory of Information Retrieval最新文献

英文 中文
Optimal Packing in Simple-Family Codecs 简单族编解码器的最佳封装
A. Trotman, Michael H. Albert, Blake Burgess
The Simple family of codecs is popular for encoding postings lists for a search engine because they are both space effective and time efficient at decoding. These algorithms pack as many integers into a codeword as possible before moving on to the next codeword. This technique is known as left-greedy. This contribution proves that left-greedy is not optimal and then goes on to introduce a dynamic programming solution to find the optimal packing. Experiments on .gov2 and INEX Wikipedia 2009 show that although this is an interesting theoretical result, left-greedy is empirically near optimal in effectiveness and efficiency.
Simple系列编解码器在为搜索引擎编码帖子列表时很受欢迎,因为它们在解码时既节省空间又节省时间。这些算法在移动到下一个码字之前,将尽可能多的整数打包到一个码字中。这种技术被称为左贪。这一贡献证明了左贪婪不是最优的,然后引入了一个动态规划解决方案来寻找最优包装。在.gov2和INEX Wikipedia 2009上的实验表明,尽管这是一个有趣的理论结果,但从经验上看,左贪婪在有效性和效率上接近最优。
{"title":"Optimal Packing in Simple-Family Codecs","authors":"A. Trotman, Michael H. Albert, Blake Burgess","doi":"10.1145/2808194.2809483","DOIUrl":"https://doi.org/10.1145/2808194.2809483","url":null,"abstract":"The Simple family of codecs is popular for encoding postings lists for a search engine because they are both space effective and time efficient at decoding. These algorithms pack as many integers into a codeword as possible before moving on to the next codeword. This technique is known as left-greedy. This contribution proves that left-greedy is not optimal and then goes on to introduce a dynamic programming solution to find the optimal packing. Experiments on .gov2 and INEX Wikipedia 2009 show that although this is an interesting theoretical result, left-greedy is empirically near optimal in effectiveness and efficiency.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133670788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic Information Retrieval: Theoretical Framework and Application 动态信息检索:理论框架与应用
Marc Sloan, Jun Wang
Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Information Retrieval variant have guided the development of ranking and retrieval algorithms for decades, yet they are not capable of helping us model problems in Dynamic Information Retrieval which exhibit the following three properties; an observable user signal, retrieval over multiple stages and an overall search intent. In this paper a new theoretical framework for retrieval in these scenarios is proposed. We derive a general dynamic utility function for optimizing over these types of tasks, that takes into account the utility of each stage and the probability of observing user feedback. We apply our framework to experiments over TREC data in the dynamic multi page search scenario as a practical demonstration of its effectiveness and to frame the discussion of its use, its limitations and to compare it against the existing frameworks.
几十年来,概率排序原则及其最近的交互式信息检索变体等理论框架指导了排序和检索算法的发展,但它们无法帮助我们对动态信息检索中的问题进行建模,动态信息检索表现出以下三个特征:一个可观察的用户信号,多个阶段的检索和一个整体的搜索意图。本文提出了一个新的理论框架,用于这些场景下的检索。我们推导了一个通用的动态效用函数来优化这些类型的任务,它考虑了每个阶段的效用和观察用户反馈的概率。我们将我们的框架应用于动态多页搜索场景中TREC数据的实验,作为其有效性的实际演示,并对其使用、局限性进行讨论,并将其与现有框架进行比较。
{"title":"Dynamic Information Retrieval: Theoretical Framework and Application","authors":"Marc Sloan, Jun Wang","doi":"10.1145/2808194.2809457","DOIUrl":"https://doi.org/10.1145/2808194.2809457","url":null,"abstract":"Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Information Retrieval variant have guided the development of ranking and retrieval algorithms for decades, yet they are not capable of helping us model problems in Dynamic Information Retrieval which exhibit the following three properties; an observable user signal, retrieval over multiple stages and an overall search intent. In this paper a new theoretical framework for retrieval in these scenarios is proposed. We derive a general dynamic utility function for optimizing over these types of tasks, that takes into account the utility of each stage and the probability of observing user feedback. We apply our framework to experiments over TREC data in the dynamic multi page search scenario as a practical demonstration of its effectiveness and to frame the discussion of its use, its limitations and to compare it against the existing frameworks.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132986229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Embedded Representations of Lexical and Knowledge-Base Semantics 词汇语义和知识库语义的嵌入式表示
A. McCallum
BIO Andrew McCallum is a Professor, Director of the Center for Data Science, and Director of the Information Extraction and Synthesis Laboratory in the College of Information and Computer Sciences at University of Massachusetts Amherst. He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 40,000 citations.
Andrew McCallum是马萨诸塞大学阿姆赫斯特分校信息与计算机科学学院的教授、数据科学中心主任、信息提取与合成实验室主任。他在人工智能的许多领域发表了250多篇论文,包括自然语言处理、机器学习、数据挖掘和强化学习,他的工作被引用超过4万次。
{"title":"Embedded Representations of Lexical and Knowledge-Base Semantics","authors":"A. McCallum","doi":"10.1145/2808194.2808195","DOIUrl":"https://doi.org/10.1145/2808194.2808195","url":null,"abstract":"BIO Andrew McCallum is a Professor, Director of the Center for Data Science, and Director of the Information Extraction and Synthesis Laboratory in the College of Information and Computer Sciences at University of Massachusetts Amherst. He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 40,000 citations.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building a Self-Contained Search Engine in the Browser 在浏览器中构建一个独立的搜索引擎
Jimmy J. Lin
JavaScript engines inside modern web browsers are capable of running sophisticated multi-player games, rendering impressive 3D scenes, and supporting complex, interactive visualizations. Can this processing power be harnessed for information retrieval? This paper explores the feasibility of building a JavaScript search engine that runs completely self-contained on the client side within the browser - this includes building the inverted index, gathering terms statistics for scoring, and performing query evaluation. The design takes advantage of the IndexDB API, which is implemented by the LevelDB key{value store inside Google's Chrome browser. Experiments show that although the performance of the JavaScript prototype falls far short of the open-source Lucene search engine, it is sufficiently responsive for interactive applications. This feasibility demonstration opens the door to interesting applications and architectures.
现代web浏览器中的JavaScript引擎能够运行复杂的多人游戏,呈现令人印象深刻的3D场景,并支持复杂的交互式可视化。这种处理能力能否用于信息检索?本文探讨了构建一个在浏览器的客户端上完全独立运行的JavaScript搜索引擎的可行性——这包括构建倒排索引、收集用于评分的术语统计信息以及执行查询评估。该设计利用了IndexDB API,它是由b谷歌的Chrome浏览器中的LevelDB key{值存储实现的。实验表明,尽管JavaScript原型的性能远不及开源的Lucene搜索引擎,但它对交互式应用程序的响应足够灵敏。这个可行性演示为有趣的应用程序和体系结构打开了大门。
{"title":"Building a Self-Contained Search Engine in the Browser","authors":"Jimmy J. Lin","doi":"10.1145/2808194.2809478","DOIUrl":"https://doi.org/10.1145/2808194.2809478","url":null,"abstract":"JavaScript engines inside modern web browsers are capable of running sophisticated multi-player games, rendering impressive 3D scenes, and supporting complex, interactive visualizations. Can this processing power be harnessed for information retrieval? This paper explores the feasibility of building a JavaScript search engine that runs completely self-contained on the client side within the browser - this includes building the inverted index, gathering terms statistics for scoring, and performing query evaluation. The design takes advantage of the IndexDB API, which is implemented by the LevelDB key{value store inside Google's Chrome browser. Experiments show that although the performance of the JavaScript prototype falls far short of the open-source Lucene search engine, it is sufficiently responsive for interactive applications. This feasibility demonstration opens the door to interesting applications and architectures.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127240902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Analysis of Theories of Search and Search Behavior 搜索理论与搜索行为分析
L. Azzopardi, G. Zuccon
Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.
搜索和搜索行为理论可以用来收集见解,并产生关于人们如何与检索系统交互的假设。本文考察了三个这样的理论,即长期存在的信息觅食理论,以及最近提出的搜索经济理论和交互概率排序原则。我们的目标是开发一个使用每种方法的特别主题检索模型,所有这些方法都在一个共同的框架内,以便(1)确定每种方法对搜索行为的预测,(2)显示方法之间的关系、等价和差异。虽然每种方法都采用不同的视角来建模搜索者交互,但我们表明,在某些假设下,它们会导致关于搜索行为的类似假设。此外,我们表明,这些模型是相互补充的,但在不同的级别(即,会话,补丁和情况)上运行。我们进一步展示了方法之间的差异如何导致对理论和新模型的新见解。这一贡献不仅将导致进一步的理论发展,而且还使实践者能够根据现有数据采用三种等效模型中的一种。
{"title":"An Analysis of Theories of Search and Search Behavior","authors":"L. Azzopardi, G. Zuccon","doi":"10.1145/2808194.2809447","DOIUrl":"https://doi.org/10.1145/2808194.2809447","url":null,"abstract":"Theories of search and search behavior can be used to glean insights and generate hypotheses about how people interact with retrieval systems. This paper examines three such theories, the long standing Information Foraging Theory, along with the more recently proposed Search Economic Theory and the Interactive Probability Ranking Principle. Our goal is to develop a model for ad-hoc topic retrieval using each approach, all within a common framework, in order to (1) determine what predictions each approach makes about search behavior, and (2) show the relationships, equivalences and differences between the approaches. While each approach takes a different perspective on modeling searcher interactions, we show that under certain assumptions, they lead to similar hypotheses regarding search behavior. Moreover, we show that the models are complementary to each other, but operate at different levels (i.e., sessions, patches and situations). We further show how the differences between the approaches lead to new insights into the theories and new models. This contribution will not only lead to further theoretical developments, but also enables practitioners to employ one of the three equivalent models depending on the data available.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121936182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Learning Asymmetric Co-Relevance 学习非对称共关联
Fiana Raiber, Oren Kurland, Filip Radlinski, Milad Shokouhi
Several applications in information retrieval rely on asymmetric co-relevance estimation; that is, estimating the relevance of a document to a query under the assumption that another document is relevant. We present a supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as document-quality measures. Empirical evaluation demonstrates the merits of using the co-relevance estimate in various applications, including cluster-based and graph-based document retrieval. Specifically, the resultant performance transcends that of using a wide variety of alternative estimates, mostly symmetric inter-document similarity measures that dominate past work.
非对称相关估计在信息检索中的应用也就是说,在假设另一个文档是相关的情况下,估计文档与查询的相关性。我们提出了一个学习非对称相关估计的监督模型。该模型使用与假定的相关文档和查询的不同类型的相似性,以及文档质量度量。经验评估证明了在各种应用中使用相关估计的优点,包括基于聚类和基于图的文档检索。具体地说,由此产生的性能优于使用各种各样的替代估计,大多数对称的文档间相似性度量在过去的工作中占主导地位。
{"title":"Learning Asymmetric Co-Relevance","authors":"Fiana Raiber, Oren Kurland, Filip Radlinski, Milad Shokouhi","doi":"10.1145/2808194.2809454","DOIUrl":"https://doi.org/10.1145/2808194.2809454","url":null,"abstract":"Several applications in information retrieval rely on asymmetric co-relevance estimation; that is, estimating the relevance of a document to a query under the assumption that another document is relevant. We present a supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as document-quality measures. Empirical evaluation demonstrates the merits of using the co-relevance estimate in various applications, including cluster-based and graph-based document retrieval. Specifically, the resultant performance transcends that of using a wide variety of alternative estimates, mostly symmetric inter-document similarity measures that dominate past work.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132161532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Feasibility of Brute Force Scans for Real-Time Tweet Search 蛮力扫描用于实时Tweet搜索的可行性
Yulu Wang, Jimmy J. Lin
The real-time search problem requires making ingested documents immediately searchable, which presents architectural challenges for systems built around inverted indexing. In this paper, we explore a radical proposition: What if we abandon document inversion and instead adopt an architecture based on brute force scans of document representations? In such a design, "indexing" simply involves appending the parsed representation of an ingested document to an existing buffer, which is simple and fast. Quite surprisingly, experiments with TREC Microblog test collections show that query evaluation with brute force scans is feasible and performance compares favorably to a traditional search architecture based on an inverted index, especially if we take advantage of vectorized SIMD instructions and multiple cores in modern processor architectures. We believe that such a novel design is worth further exploration by IR researchers and practitioners.
实时搜索问题需要使摄取的文档立即可搜索,这对围绕反向索引构建的系统提出了架构挑战。在本文中,我们探讨了一个激进的命题:如果我们放弃文档反转,转而采用基于文档表示的暴力扫描的体系结构会怎么样?在这样的设计中,“索引”只涉及将已解析文档的表示形式附加到现有缓冲区,这既简单又快速。令人惊讶的是,TREC微博测试集的实验表明,使用暴力扫描的查询评估是可行的,性能优于基于倒排索引的传统搜索架构,特别是如果我们利用矢量化的SIMD指令和现代处理器架构中的多核。我们相信这种新颖的设计值得红外研究人员和实践者进一步探索。
{"title":"The Feasibility of Brute Force Scans for Real-Time Tweet Search","authors":"Yulu Wang, Jimmy J. Lin","doi":"10.1145/2808194.2809489","DOIUrl":"https://doi.org/10.1145/2808194.2809489","url":null,"abstract":"The real-time search problem requires making ingested documents immediately searchable, which presents architectural challenges for systems built around inverted indexing. In this paper, we explore a radical proposition: What if we abandon document inversion and instead adopt an architecture based on brute force scans of document representations? In such a design, \"indexing\" simply involves appending the parsed representation of an ingested document to an existing buffer, which is simple and fast. Quite surprisingly, experiments with TREC Microblog test collections show that query evaluation with brute force scans is feasible and performance compares favorably to a traditional search architecture based on an inverted index, especially if we take advantage of vectorized SIMD instructions and multiple cores in modern processor architectures. We believe that such a novel design is worth further exploration by IR researchers and practitioners.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126031686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Part-of-Speech N-grams for Sensitive-Text Classification 基于词性n图的敏感文本分类
G. Mcdonald, C. Macdonald, I. Ounis
Freedom of Information legislations in many western democracies, including the United Kingdom (UK) and the United States of America (USA), state that citizens have typically the right to access government documents. However, certain sensitive information is exempt from release into the public domain. For example, in the UK, FOIA Exemption 27 (International Relations) excludes the release of Information that might damage the interests of the UK abroad. Therefore, the process of reviewing government documents for sensitivity is essential to determine if a document must be redacted before it is archived, or closed until the information is no longer sensitive. With the increased volume of digital government documents in recent years, there is a need for new tools to assist the digital sensitivity review process. Therefore, in this paper we propose an automatic approach for identifying sensitive text in documents by measuring the amount of sensitivity in sequences of text. Using government documents reviewed by trained sensitivity reviewers, we focus on an aspect of FOIA Exemption 27 which can have a major impact on international relations, namely, information supplied in confidence. We show that our approach leads to markedly increased recall of sensitive text, while achieving a very high level of precision, when compared to a baseline that has been shown to be effective at identifying sensitive text in other domains.
在许多西方民主国家,包括英国(UK)和美利坚合众国(USA),信息自由立法规定公民通常有权查阅政府文件。然而,某些敏感信息是免于发布到公共领域的。例如,在英国,FOIA豁免27(国际关系)排除了可能损害英国海外利益的信息的发布。因此,审查政府文件的敏感性过程至关重要,以确定文件是否必须在存档之前进行编辑,或者直到信息不再敏感时才关闭。随着近年来数字政府文件数量的增加,需要新的工具来协助数字敏感性审查过程。因此,在本文中,我们提出了一种通过测量文本序列的敏感性来自动识别文档中敏感文本的方法。我们利用经过训练的敏感审查员审查的政府文件,重点关注《信息自由法》豁免条款27中可能对国际关系产生重大影响的一个方面,即保密提供的信息。我们表明,与基线相比,我们的方法显著提高了敏感文本的召回率,同时达到了非常高的精度,而基线在识别其他领域的敏感文本方面已被证明是有效的。
{"title":"Using Part-of-Speech N-grams for Sensitive-Text Classification","authors":"G. Mcdonald, C. Macdonald, I. Ounis","doi":"10.1145/2808194.2809496","DOIUrl":"https://doi.org/10.1145/2808194.2809496","url":null,"abstract":"Freedom of Information legislations in many western democracies, including the United Kingdom (UK) and the United States of America (USA), state that citizens have typically the right to access government documents. However, certain sensitive information is exempt from release into the public domain. For example, in the UK, FOIA Exemption 27 (International Relations) excludes the release of Information that might damage the interests of the UK abroad. Therefore, the process of reviewing government documents for sensitivity is essential to determine if a document must be redacted before it is archived, or closed until the information is no longer sensitive. With the increased volume of digital government documents in recent years, there is a need for new tools to assist the digital sensitivity review process. Therefore, in this paper we propose an automatic approach for identifying sensitive text in documents by measuring the amount of sensitivity in sequences of text. Using government documents reviewed by trained sensitivity reviewers, we focus on an aspect of FOIA Exemption 27 which can have a major impact on international relations, namely, information supplied in confidence. We show that our approach leads to markedly increased recall of sensitive text, while achieving a very high level of precision, when compared to a baseline that has been shown to be effective at identifying sensitive text in other domains.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Entity Linking in Queries: Tasks and Evaluation 查询中的实体链接:任务和评估
Faegheh Hasibi, K. Balog, Svein Erik Bratsberg
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
用实体标注查询是查询理解中的核心问题之一。虽然看起来很相似,但查询中的实体链接任务与文档中的实体链接任务不同,并且由于查询固有的模糊性,需要在方法上有所不同。我们区分了两个特定的任务,语义映射和解释发现,讨论了当前的评估方法,并提出了改进建议。我们为这些任务检查了公开可用的数据集,并引入了一个新的手动管理的数据集来进行解释查找。为了进一步加深对任务差异的理解,我们提出了一套有效解决这些任务的方法,并报告了实验结果。
{"title":"Entity Linking in Queries: Tasks and Evaluation","authors":"Faegheh Hasibi, K. Balog, Svein Erik Bratsberg","doi":"10.1145/2808194.2809473","DOIUrl":"https://doi.org/10.1145/2808194.2809473","url":null,"abstract":"Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"54 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113957363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
A Theoretical Analysis of Two-Stage Recommendation for Cold-Start Collaborative Filtering 冷启动协同过滤的两阶段推荐理论分析
Xiaoxue Zhao, Jun Wang
In this paper, we present a theoretical framework for tackling the cold-start collaborative filtering problem, where unknown targets (items or users) keep coming to the system, and there is a limited number of resources (users or items) that can be allocated and related to them. The solution requires a trade-off between exploitation and exploration since with the limited recommendation opportunities, we need to, on one hand, allocate the most relevant resources right away, but, on the other hand, it is also necessary to allocate resources that are useful for learning the target's properties in order to recommend more relevant ones in the future. In this paper, we study a simple two-stage recommendation combining a sequential and a batch solution together. We first model the problem with the partially observable Markov decision process (POMDP) and provide its exact solution. Then, through an in-depth analysis over the POMDP value iteration solution, we identify that an exact solution can be abstracted as selecting resources that are not only highly relevant to the target according to the initial-stage information, but also highly correlated, either positively or negatively, with other potential resources for the next stage. With this finding, we propose an approximate solution to ease the intractability of the exact solution. Our initial results on synthetic data and the MovieLens 100K dataset confirm our theoretical development and analysis.
在本文中,我们提出了一个理论框架来解决冷启动协同过滤问题,其中未知目标(项目或用户)不断进入系统,并且可以分配和与之相关的资源(用户或项目)数量有限。该解决方案需要在开发和探索之间进行权衡,因为在有限的推荐机会下,我们一方面需要立即分配最相关的资源,但另一方面,也有必要分配对学习目标属性有用的资源,以便在将来推荐更相关的资源。在本文中,我们研究了一个简单的两阶段推荐,将顺序和批量解决方案结合在一起。我们首先用部分可观察马尔可夫决策过程(POMDP)对问题进行建模,并给出其精确解。然后,通过对POMDP值迭代解的深入分析,我们发现精确解可以抽象为根据初始阶段信息选择与目标高度相关的资源,并且与下一阶段的其他潜在资源高度相关(或正相关或负相关)。根据这一发现,我们提出了一个近似解来缓解精确解的棘手。我们在合成数据和MovieLens 100K数据集上的初步结果证实了我们的理论发展和分析。
{"title":"A Theoretical Analysis of Two-Stage Recommendation for Cold-Start Collaborative Filtering","authors":"Xiaoxue Zhao, Jun Wang","doi":"10.1145/2808194.2809459","DOIUrl":"https://doi.org/10.1145/2808194.2809459","url":null,"abstract":"In this paper, we present a theoretical framework for tackling the cold-start collaborative filtering problem, where unknown targets (items or users) keep coming to the system, and there is a limited number of resources (users or items) that can be allocated and related to them. The solution requires a trade-off between exploitation and exploration since with the limited recommendation opportunities, we need to, on one hand, allocate the most relevant resources right away, but, on the other hand, it is also necessary to allocate resources that are useful for learning the target's properties in order to recommend more relevant ones in the future. In this paper, we study a simple two-stage recommendation combining a sequential and a batch solution together. We first model the problem with the partially observable Markov decision process (POMDP) and provide its exact solution. Then, through an in-depth analysis over the POMDP value iteration solution, we identify that an exact solution can be abstracted as selecting resources that are not only highly relevant to the target according to the initial-stage information, but also highly correlated, either positively or negatively, with other potential resources for the next stage. With this finding, we propose an approximate solution to ease the intractability of the exact solution. Our initial results on synthetic data and the MovieLens 100K dataset confirm our theoretical development and analysis.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124169189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 2015 International Conference on The Theory of Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1