Proceedings of the 2015 International Conference on The Theory of Information Retrieval最新文献

英文中文

Anytime Ranking for Impact-Ordered Indexes 影响排序索引的任何时间排序

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pub Date : 2015-09-27 DOI: 10.1145/2808194.2809477

Jimmy J. Lin, A. Trotman

The ability for a ranking function to control its own execution time is useful for managing load, reigning in outliers, and adapting to different types of queries. We propose a simple yet effective anytime algorithm for impact-ordered indexes that builds on a score-at-a-time query evaluation strategy. In our approach, postings segments are processed in decreasing order of their impact scores, and the algorithm early terminates when a specified number of postings have been processed. With a simple linear model and a few training topics, we can determine this threshold given a time budget in milliseconds. Experiments on two web test collections show that our approach can accurately control query evaluation latency and that aggressive limits on execution time lead to minimal decreases in effectiveness.

排序函数控制自己执行时间的能力对于管理负载、控制异常值和适应不同类型的查询非常有用。我们为影响排序索引提出了一个简单而有效的任何时间算法，该算法建立在一次得分查询评估策略之上。在我们的方法中，帖子段按照其影响分数的降序进行处理，当处理了指定数量的帖子时，算法会提前终止。使用一个简单的线性模型和几个训练主题，我们可以在给定以毫秒为单位的时间预算的情况下确定这个阈值。在两个web测试集合上的实验表明，我们的方法可以准确地控制查询计算延迟，并且对执行时间的严格限制导致有效性的最小降低。

引用次数: 52

Language-independent Query Representation for IR Model Parameter Estimation on Unlabeled Collections 非标记集合IR模型参数估计的语言无关查询表示

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pub Date : 2015-09-27 DOI: 10.1145/2808194.2809451

Parantapa Goswami, Massih-Reza Amini, Éric Gaussier

We study here the problem of estimating the parameters of standard IR models (as BM25 or language models) on new collections without any relevance judgments, by using collections with already available relevance judgements. We propose different query representations that allow mapping queries (with and without relevance judgments, from different collections, potentially in different languages) into a common space. We then introduce a kernel regression approach to learn the parameters of standard IR models individually for each query in the new, unlabeled collection. Our experiments, conducted on standard English and Indian IR collections, show that our approach can be used to efficiently tune, query by query, standard IR models to new collections, potentially written in different languages. In particular, the versions of the standard IR models we obtain not only outperform the versions with default parameters, but can also outperform the versions in which the parameter values have been optimized globally over a set of queries with target relevance judgements.

我们在这里研究了在没有任何相关性判断的新集合上估计标准IR模型(如BM25或语言模型)参数的问题，方法是使用具有已有相关性判断的集合。我们提出了不同的查询表示，允许将查询(有或没有相关性判断，来自不同的集合，可能使用不同的语言)映射到一个公共空间。然后，我们引入核回归方法，为新的未标记集合中的每个查询单独学习标准IR模型的参数。我们在标准英语和印度语IR集合上进行的实验表明，我们的方法可以通过查询有效地调整标准IR模型，以适应可能用不同语言编写的新集合。特别是，我们获得的标准IR模型的版本不仅优于具有默认参数的版本，而且还优于参数值在一组具有目标相关性判断的查询上进行全局优化的版本。

{"title":"Language-independent Query Representation for IR Model Parameter Estimation on Unlabeled Collections","authors":"Parantapa Goswami, Massih-Reza Amini, Éric Gaussier","doi":"10.1145/2808194.2809451","DOIUrl":"https://doi.org/10.1145/2808194.2809451","url":null,"abstract":"We study here the problem of estimating the parameters of standard IR models (as BM25 or language models) on new collections without any relevance judgments, by using collections with already available relevance judgements. We propose different query representations that allow mapping queries (with and without relevance judgments, from different collections, potentially in different languages) into a common space. We then introduce a kernel regression approach to learn the parameters of standard IR models individually for each query in the new, unlabeled collection. Our experiments, conducted on standard English and Indian IR collections, show that our approach can be used to efficiently tune, query by query, standard IR models to new collections, potentially written in different languages. In particular, the versions of the standard IR models we obtain not only outperform the versions with default parameters, but can also outperform the versions in which the parameter values have been optimized globally over a set of queries with target relevance judgements.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121734739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application to IR 基于熵和图的基于篇章实体的文档一致性建模:在IR中的应用

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pub Date : 2015-07-29 DOI: 10.1145/2808194.2809458

Casper Petersen, C. Lioma, J. Simonsen, Birger Larsen

We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56].

本文提出了两种新的文档一致性模型及其在信息检索中的应用。这两种模型都使用话语实体(例如句子的主语或宾语)来近似文档一致性。我们的第一个模型将文本视为生成话语实体序列(实体n-grams)的马尔可夫过程;我们使用这些实体n-图的熵来近似新信息在文本中出现的速率，理由是随着越来越多的新词出现，主题越来越偏离，文本一致性下降。我们的第二个模型扩展了Guinaudeau & Strube[28]的工作，将文本表示为话语实体的图，通过不同的关系(例如它们在文本中的距离或邻接关系)连接起来。我们使用几个图拓扑度量来近似话语流的不同方面，这些方面可以表示连贯性，例如文本中话语实体的平均聚类或之间性。对这些模型的几个实例进行的实验表明:(i)即使没有任何参数调整，我们的模型的性能也与其他两个知名的文本一致性模型相当;(ii)根据检索结果的一致性分数对检索结果进行重新排序，可以显著提高性能，证实了文档一致性和相关性之间的关系。这项工作贡献了两个新的文档连贯性模型，将其应用于IR，补充了最近将文档凝聚力或可理解性整合到排名中的工作[5,56]。

{"title":"Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application to IR","authors":"Casper Petersen, C. Lioma, J. Simonsen, Birger Larsen","doi":"10.1145/2808194.2809458","DOIUrl":"https://doi.org/10.1145/2808194.2809458","url":null,"abstract":"We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56].","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133098464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Statistical Significance Testing in Information Retrieval: Theory and Practice 信息检索中的统计显著性检验:理论与实践

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pub Date : 2014-07-03 DOI: 10.1145/2808194.2809445

Ben Carterette

The past 20 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtrieval Conference [28]), and the increased practice of sta- tistical hypothesis testing to determine whether measured improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work in information retrieval (IR) increasingly cannot be published unless it has been evaluated using a well-constructed test collection and shown to produce a statistically significant improvement over a good baseline. But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Most researchers and developers treat them as a "black box": evaluation results go in and a p-value comes out. But because significance is such an important factor in determining what research directions to explore and what is published, using p-values obtained without thought can have consequences for everyone doing research in IR. Ioannidis has argued that the main consequence in the biomedical sciences is that most published research findings are false [12]; could that be the case in IR as well?

在过去的20年里，信息检索实验的严谨性有了很大的提高，这主要是由于两个因素:高质量的、公开的、便携的测试集合，如TREC(文本检索会议[28])生产的测试集合，以及统计假设检验的增加，以确定测量的改进是否可以归因于随机机会以外的其他因素。这些共同为审稿人、项目委员会和期刊编辑创造了一个非常有用的标准;信息检索(IR)方面的工作越来越不能发表，除非使用构造良好的测试集对其进行评估，并显示在良好的基线上产生统计上显著的改进。但是，正如俗话所说，任何锋利到有用的工具也锋利到危险的程度。显著性统计检验被广泛误解。大多数研究人员和开发人员将其视为“黑箱”:评估结果输入，p值输出。但是，由于重要性是决定研究方向和发表内容的重要因素，使用未经思考的p值可能会对每个从事IR研究的人产生影响。Ioannidis认为生物医学科学的主要后果是大多数发表的研究结果是错误的[12];IR也会是这样吗?

{"title":"Statistical Significance Testing in Information Retrieval: Theory and Practice","authors":"Ben Carterette","doi":"10.1145/2808194.2809445","DOIUrl":"https://doi.org/10.1145/2808194.2809445","url":null,"abstract":"The past 20 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC (the Text REtrieval Conference [28]), and the increased practice of sta- tistical hypothesis testing to determine whether measured improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work in information retrieval (IR) increasingly cannot be published unless it has been evaluated using a well-constructed test collection and shown to produce a statistically significant improvement over a good baseline. But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Most researchers and developers treat them as a \"black box\": evaluation results go in and a p-value comes out. But because significance is such an important factor in determining what research directions to explore and what is published, using p-values obtained without thought can have consequences for everyone doing research in IR. Ioannidis has argued that the main consequence in the biomedical sciences is that most published research findings are false [12]; could that be the case in IR as well?","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115375602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Proceedings of the 2015 International Conference on The Theory of Information Retrieval 2015信息检索理论国际学术会议论文集

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pub Date : 1900-01-01 DOI: 10.1145/2808194

引用次数: 3

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀