首页 > 最新文献

Proceedings of the 2015 International Conference on The Theory of Information Retrieval最新文献

英文 中文
Study of Heuristic IR Constraints Under Function Discovery Framework 函数发现框架下启发式IR约束的研究
Parantapa Goswami, Massih-Reza Amini, Éric Gaussier
In this paper we investigate the effect of the heuristic IR constraints on IR term-document scoring functions within the recently proposed function discovery framework. In the earlier study the constraints were empirically validated as a whole. Moreover, only the group of form constraints was utilized and the other prominent group, the adjustment constraints, was not considered. In this work we will investigate all the constraints individually and study them with two different term frequency normalization, namely normalization scheme used in DFR models and relative term count normalization used in language models.
在最近提出的功能发现框架中,我们研究了启发式IR约束对IR术语-文档评分函数的影响。在早期的研究中,约束作为一个整体得到了经验验证。此外,仅利用了形式约束组,而未考虑另一突出组,即调整约束。在这项工作中,我们将单独研究所有约束,并使用两种不同的术语频率归一化来研究它们,即DFR模型中使用的归一化方案和语言模型中使用的相对术语计数归一化。
{"title":"Study of Heuristic IR Constraints Under Function Discovery Framework","authors":"Parantapa Goswami, Massih-Reza Amini, Éric Gaussier","doi":"10.1145/2808194.2809479","DOIUrl":"https://doi.org/10.1145/2808194.2809479","url":null,"abstract":"In this paper we investigate the effect of the heuristic IR constraints on IR term-document scoring functions within the recently proposed function discovery framework. In the earlier study the constraints were empirically validated as a whole. Moreover, only the group of form constraints was utilized and the other prominent group, the adjustment constraints, was not considered. In this work we will investigate all the constraints individually and study them with two different term frequency normalization, namely normalization scheme used in DFR models and relative term count normalization used in language models.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133339214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Patent Search by Search Result Diversification 利用检索结果多样化改进专利检索
Youngho Kim, W. Bruce Croft
Patent retrieval has some unique features relative to web search. One major task in this domain is finding existing patents that may invalidate new patents, known as prior-art or invalidity search, where search queries can be formulated from query patents (i.e., new patents). Since a patent document generally contains long and complex descriptions, generating effective search queries can be complex and difficult. Typically, these queries must cover diverse aspects of the new patent application in order to retrieve relevant documents that cover the full scope of the patent. Given this context, search diversification techniques can potentially improve the retrieval performance of patent search by introducing diversity into the document ranking. In this paper, we examine the effectiveness for patent search of a recent term-based diversification framework. Using this framework involves developing methods to identify effective phrases related to the topics mentioned in the query patent. In our experiments, we evaluate our diversification approach using standard measures of retrieval effectiveness and diversity, and show significant improvements relative to state-of-the-art baselines.
专利检索相对于网络检索有一些独特的特点。该领域的一个主要任务是查找可能使新专利无效的现有专利,称为现有技术或无效搜索,其中搜索查询可以从查询专利(即新专利)中制定。由于专利文件通常包含长而复杂的描述,因此生成有效的搜索查询可能既复杂又困难。通常,这些查询必须涵盖新专利申请的各个方面,以便检索涵盖专利全部范围的相关文档。在这种背景下,搜索多样化技术可以通过将多样性引入到文档排序中来潜在地提高专利检索的检索性能。在本文中,我们检验了最近的基于术语的多样化框架的专利检索的有效性。使用此框架涉及开发方法来识别与查询专利中提到的主题相关的有效短语。在我们的实验中,我们使用检索有效性和多样性的标准度量来评估我们的多样化方法,并显示出相对于最先进的基线的显著改进。
{"title":"Improving Patent Search by Search Result Diversification","authors":"Youngho Kim, W. Bruce Croft","doi":"10.1145/2808194.2809455","DOIUrl":"https://doi.org/10.1145/2808194.2809455","url":null,"abstract":"Patent retrieval has some unique features relative to web search. One major task in this domain is finding existing patents that may invalidate new patents, known as prior-art or invalidity search, where search queries can be formulated from query patents (i.e., new patents). Since a patent document generally contains long and complex descriptions, generating effective search queries can be complex and difficult. Typically, these queries must cover diverse aspects of the new patent application in order to retrieve relevant documents that cover the full scope of the patent. Given this context, search diversification techniques can potentially improve the retrieval performance of patent search by introducing diversity into the document ranking. In this paper, we examine the effectiveness for patent search of a recent term-based diversification framework. Using this framework involves developing methods to identify effective phrases related to the topics mentioned in the query patent. In our experiments, we evaluate our diversification approach using standard measures of retrieval effectiveness and diversity, and show significant improvements relative to state-of-the-art baselines.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"30 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115603377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Query Expansion with Freebase 查询扩展与Freebase
Chenyan Xiong, Jamie Callan
Large knowledge bases are being developed to describe entities, their attributes, and their relationships to other entities. Prior research mostly focuses on the construction of knowledge bases, while how to use them in information retrieval is still an open problem. This paper presents a simple and effective method of using one such knowledge base, Freebase, to improve query expansion, a classic and widely studied information retrieval task. It investigates two methods of identifying the entities associated with a query, and two methods of using those entities to perform query expansion. A supervised model combines information derived from Freebase descriptions and categories to select terms that are effective for query expansion. Experiments on the ClueWeb09 dataset with TREC Web Track queries demonstrate that these methods are almost 30% more effective than strong, state-of-the-art query expansion algorithms. In addition to improving average performance, some of these methods have better win/loss ratios than baseline algorithms, with 50% fewer queries damaged.
正在开发大型知识库来描述实体、它们的属性以及它们与其他实体的关系。以往的研究大多集中在知识库的构建上,而如何利用知识库进行信息检索仍然是一个有待解决的问题。本文提出了一种简单有效的方法,利用Freebase知识库来改进查询扩展这一经典的、被广泛研究的信息检索任务。本文研究了识别与查询关联的实体的两种方法,以及使用这些实体执行查询扩展的两种方法。监督模型结合Freebase描述和类别的信息来选择对查询扩展有效的术语。在ClueWeb09数据集上使用TREC Web Track查询的实验表明,这些方法比强大的、最先进的查询扩展算法有效近30%。除了提高平均性能外,其中一些方法比基线算法具有更好的胜败比,查询损坏减少了50%。
{"title":"Query Expansion with Freebase","authors":"Chenyan Xiong, Jamie Callan","doi":"10.1145/2808194.2809446","DOIUrl":"https://doi.org/10.1145/2808194.2809446","url":null,"abstract":"Large knowledge bases are being developed to describe entities, their attributes, and their relationships to other entities. Prior research mostly focuses on the construction of knowledge bases, while how to use them in information retrieval is still an open problem. This paper presents a simple and effective method of using one such knowledge base, Freebase, to improve query expansion, a classic and widely studied information retrieval task. It investigates two methods of identifying the entities associated with a query, and two methods of using those entities to perform query expansion. A supervised model combines information derived from Freebase descriptions and categories to select terms that are effective for query expansion. Experiments on the ClueWeb09 dataset with TREC Web Track queries demonstrate that these methods are almost 30% more effective than strong, state-of-the-art query expansion algorithms. In addition to improving average performance, some of these methods have better win/loss ratios than baseline algorithms, with 50% fewer queries damaged.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114676385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums 用于检索CQA论坛中相似问题的部分标记监督主题模型
Debasis Ganguly, G. Jones
Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.
在社区问答论坛和社交媒体中,用户生成内容的手动注释(如标签和链接)在使内容可搜索方面起着重要作用。在进入CQA论坛的新问题的活跃阶段,版主或回答者通常必须付出巨大的努力,手动搜索相关的问题线程(我们称之为文档),他可能会考虑将其链接到当前问题。通过自动搜索过程建议将候选文档列表链接到新文档,可以大大减少这种手工工作。我们描述了我们对该任务的链接推荐的调查。我们把这个问题当作一个特别的信息检索(IR)任务来处理,其中一个新文档(问题)充当查询,目的是检索潜在相关文档(以前在论坛中提出的问题)的列表,然后可以(手动)链接到新的文档。与标准的特别搜索相比,两个人工注释的附加信息,即文档的标记和现有文档对之间的已知链接,可以潜在地用于提高新问题的搜索质量。为了利用这些额外的信息,我们提出了一个标记文档的生成模型,该模型联合估计文档的每个标签对应的主题分布以及文档链接到另一个文档的可能性。然后将模型预测合并到IR的标准语言模型(LM)的查询似然估计中。在三个月的爬行StackOverflow数据集上进行的实验表明,利用特定于标签的主题分布可以显著改善相关文档候选集的检索。
{"title":"Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums","authors":"Debasis Ganguly, G. Jones","doi":"10.1145/2808194.2809460","DOIUrl":"https://doi.org/10.1145/2808194.2809460","url":null,"abstract":"Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114990973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Two Operators to Define and Manipulate Themes of a Document Collection 定义和操作文档集合主题的两个操作符
E. D. Buccio, M. Melucci
In this paper, we propose the theme model, which will provide the end user with join and meet operators to define and manipulate themes. These operators have properties that cannot be reduced to the classical logic operators, thus allowing the researchers to model the informative content of documents in a novel way and to rank documents in ways other than those provided by the classical logic. To this end, we introduce the main definitions and properties of the theme model and we link the model to a number of related techniques, thus suggesting how the model can be implemented and applied.
在本文中,我们提出了主题模型,该模型将为最终用户提供连接和满足操作符来定义和操作主题。这些运算符具有不能简化为经典逻辑运算符的属性,从而允许研究人员以一种新颖的方式对文档的信息内容进行建模,并以经典逻辑提供的方式以外的方式对文档进行排序。为此,我们介绍了主题模型的主要定义和属性,并将该模型与许多相关技术联系起来,从而建议如何实现和应用该模型。
{"title":"Two Operators to Define and Manipulate Themes of a Document Collection","authors":"E. D. Buccio, M. Melucci","doi":"10.1145/2808194.2809482","DOIUrl":"https://doi.org/10.1145/2808194.2809482","url":null,"abstract":"In this paper, we propose the theme model, which will provide the end user with join and meet operators to define and manipulate themes. These operators have properties that cannot be reduced to the classical logic operators, thus allowing the researchers to model the informative content of documents in a novel way and to rank documents in ways other than those provided by the classical logic. To this end, we introduce the main definitions and properties of the theme model and we link the model to a number of related techniques, thus suggesting how the model can be implemented and applied.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating the Uncertainty of Average F1 Scores 估计F1平均分数的不确定性
Dell Zhang, Jun Wang, Xiaoxue Zhao
In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning.
在多类文本分类中,分类器的性能(有效性)通常通过微观平均和宏观平均F1分数来衡量。然而,分数本身并不能告诉我们它们在预测分类器在未知数据上的未来表现方面有多可靠。在本文中,我们提出了一种通过贝叶斯推理来明确建模F1平均分数不确定性的新方法。
{"title":"Estimating the Uncertainty of Average F1 Scores","authors":"Dell Zhang, Jun Wang, Xiaoxue Zhao","doi":"10.1145/2808194.2809488","DOIUrl":"https://doi.org/10.1145/2808194.2809488","url":null,"abstract":"In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115649497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Pooling for User-Oriented Evaluation Measures 面向用户的评价措施池化
G. Baruah, Adam Roegiest, Mark D. Smucker
Traditional TREC-style pooling methodology relies on using predicted relevance by systems to select documents for judgment. This coincides with typical search behaviour (e.g., web search). In the case of temporally ordered streams of documents, the order that users encounter documents is in this temporal order and not some predetermined rank order. We investigate a user oriented pooling methodology focusing on the documents that simulated users would likely read in such temporally ordered streams. Under this user model, many of the relevant documents found in the TREC 2013 Temporal Summarization Track's pooling effort would never be read. Not only does our pooling strategy focus on pooling documents that will be read by (simulated) users, the resultant pools are different from the standard TREC pools.
传统的trec风格的池化方法依赖于系统使用预测的相关性来选择文档进行判断。这与典型的搜索行为(例如,网络搜索)相吻合。在文档的临时顺序流的情况下,用户遇到文档的顺序就是这个临时顺序,而不是某种预定的等级顺序。我们研究了一种面向用户的池化方法,重点关注模拟用户可能在这种临时有序的流中读取的文档。在这种用户模型下,在TREC 2013 Temporal Summarization Track的汇集工作中发现的许多相关文档将永远不会被读取。我们的池策略不仅关注将由(模拟)用户读取的文档池,而且生成的池也不同于标准TREC池。
{"title":"Pooling for User-Oriented Evaluation Measures","authors":"G. Baruah, Adam Roegiest, Mark D. Smucker","doi":"10.1145/2808194.2809493","DOIUrl":"https://doi.org/10.1145/2808194.2809493","url":null,"abstract":"Traditional TREC-style pooling methodology relies on using predicted relevance by systems to select documents for judgment. This coincides with typical search behaviour (e.g., web search). In the case of temporally ordered streams of documents, the order that users encounter documents is in this temporal order and not some predetermined rank order. We investigate a user oriented pooling methodology focusing on the documents that simulated users would likely read in such temporally ordered streams. Under this user model, many of the relevant documents found in the TREC 2013 Temporal Summarization Track's pooling effort would never be read. Not only does our pooling strategy focus on pooling documents that will be read by (simulated) users, the resultant pools are different from the standard TREC pools.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127607857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Theory of Retrieval: The Retrievability of Information 检索理论:信息的可检索性
L. Azzopardi
Retrievability is an important and interesting indicator that can be used in a number of ways to analyse Information Retrieval systems and document collections. Rather than focusing totally on relevance, retrievability examines what is retrieved, how often it is retrieved, and whether a user is likely to retrieve it or not. This is important because a document needs to be retrieved, before it can be judged for relevance. In this tutorial, we shall explain the concept of retrievability along with a number of retrievability measures, how it can be estimated and how it can be used for analysis. Since retrieval precedes relevance, we shall also provide an overview of how retrievability relates to effectiveness - describing some of the insights that researchers have discovered thus far. We shall also show how retrievability relates to efficiency, and how the theory of retrievability can be used to improve both effectiveness and efficiency. Then we shall provide an overview of the different applications of retrievability such as Search Engine Bias, Corpus Profiling, etc., before wrapping up with challenges and opportunities. The final session will look at example problems and ways to analyse and apply retrievability to other problems and domains. Participants are invited to bring their own problems to be discussed after the tutorial. This half-day tutorial is ideal for: (i) researchers curious about retrievability and wanting to see how it can impact their research, (ii) researchers who would like to expand their set of analysis techniques, and/or (iii) researchers who would like to use retrievability to perform their own analysis.
可检索性是一个重要而有趣的指标,可以用许多方法来分析信息检索系统和文档集合。可检索性不是完全关注相关性,而是检查被检索的内容、检索的频率以及用户是否可能检索它。这一点很重要,因为在判断文档的相关性之前,需要检索文档。在本教程中,我们将解释可收回性的概念以及一些可收回性度量,如何对其进行估计以及如何将其用于分析。由于检索先于相关性,我们还将概述可检索性与有效性之间的关系——描述研究人员迄今为止发现的一些见解。我们还将展示可检索性与效率的关系,以及如何利用可检索性理论来提高有效性和效率。然后,我们将概述可检索性的不同应用,如搜索引擎偏差、语料库分析等,然后再总结挑战和机遇。最后一节课将讨论示例问题以及分析和应用可检索性到其他问题和领域的方法。课程结束后,我们邀请参与者带来他们自己的问题进行讨论。这个半天的教程是理想的:(i)研究人员好奇的可检索性,并希望看到它如何影响他们的研究,(ii)研究人员谁想要扩大他们的分析技术,和/或(iii)研究人员谁想要使用可检索性来执行自己的分析。
{"title":"Theory of Retrieval: The Retrievability of Information","authors":"L. Azzopardi","doi":"10.1145/2808194.2809444","DOIUrl":"https://doi.org/10.1145/2808194.2809444","url":null,"abstract":"Retrievability is an important and interesting indicator that can be used in a number of ways to analyse Information Retrieval systems and document collections. Rather than focusing totally on relevance, retrievability examines what is retrieved, how often it is retrieved, and whether a user is likely to retrieve it or not. This is important because a document needs to be retrieved, before it can be judged for relevance. In this tutorial, we shall explain the concept of retrievability along with a number of retrievability measures, how it can be estimated and how it can be used for analysis. Since retrieval precedes relevance, we shall also provide an overview of how retrievability relates to effectiveness - describing some of the insights that researchers have discovered thus far. We shall also show how retrievability relates to efficiency, and how the theory of retrievability can be used to improve both effectiveness and efficiency. Then we shall provide an overview of the different applications of retrievability such as Search Engine Bias, Corpus Profiling, etc., before wrapping up with challenges and opportunities. The final session will look at example problems and ways to analyse and apply retrievability to other problems and domains. Participants are invited to bring their own problems to be discussed after the tutorial. This half-day tutorial is ideal for: (i) researchers curious about retrievability and wanting to see how it can impact their research, (ii) researchers who would like to expand their set of analysis techniques, and/or (iii) researchers who would like to use retrievability to perform their own analysis.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128639189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Theoretical Analysis of Cross-lingual Semantic Relatedness in Vector Space Models 向量空间模型中跨语言语义相关性的理论分析
Lei Zhang, Thanh Tran, Achim Rettinger
Semantic relatedness is essential for different text processing tasks, especially in the cross-lingual setting due to the vocabulary mismatch problem. Many concept-based solutions to semantic relatedness have been proposed, which vary in the notions of concept and document representation. In our contribution, we provide a unified model that generalizes over the existing approaches to cross-lingual semantic relatedness. It shows that the main existing solutions represent different ways for constructing the concept space, which result in different document representations and implications for semantic relatedness computation. In particular, it al- lows us to provide theoretical justifications of existing solutions. Through the experimental evaluation, we show that the results support our theoretical findings.
语义相关性对于不同的文本处理任务至关重要,特别是在跨语言环境中,由于词汇不匹配问题。已经提出了许多基于概念的语义相关性解决方案,这些解决方案在概念和文档表示的概念上有所不同。在我们的贡献中,我们提供了一个统一的模型,该模型概括了现有的跨语言语义相关性方法。研究表明,现有的主要解决方案代表了不同的概念空间构造方式,这导致了不同的文档表示和语义相关性计算的含义。特别是,它允许我们提供现有解决方案的理论证明。通过实验评估,我们证明了结果支持我们的理论发现。
{"title":"A Theoretical Analysis of Cross-lingual Semantic Relatedness in Vector Space Models","authors":"Lei Zhang, Thanh Tran, Achim Rettinger","doi":"10.1145/2808194.2809450","DOIUrl":"https://doi.org/10.1145/2808194.2809450","url":null,"abstract":"Semantic relatedness is essential for different text processing tasks, especially in the cross-lingual setting due to the vocabulary mismatch problem. Many concept-based solutions to semantic relatedness have been proposed, which vary in the notions of concept and document representation. In our contribution, we provide a unified model that generalizes over the existing approaches to cross-lingual semantic relatedness. It shows that the main existing solutions represent different ways for constructing the concept space, which result in different document representations and implications for semantic relatedness computation. In particular, it al- lows us to provide theoretical justifications of existing solutions. Through the experimental evaluation, we show that the results support our theoretical findings.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130879424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Online News Tracking for Ad-Hoc Information Needs 针对特殊信息需求的在线新闻跟踪
Jeroen B. P. Vuurens, A. D. Vries, Roi Blanco, P. Mika
Following online news about a specific event can be a difficult task as new information is often scattered across web pages. In such cases, an up-to-date summary of the event would help to inform users and allow them to navigate to articles that are likely to contain relevant and novel details. We propose a three-step approach to online news tracking for ad-hoc information needs. First, we continuously cluster the titles of all incoming news articles. Then, we select the clusters that best fit a user's ad-hoc information need and identify salient sentences. Finally, we select sentences for the summary based on novelty and relevance to the information seen, without requiring an a-priori model of events of interest. We evaluate this approach using the 2013 TREC Temporal Summarization test set and show that compared to existing systems our approach retrieves news facts with significantly higher F-measure and Latency-Discounted Expected Gain.
关注某一特定事件的在线新闻可能是一项艰巨的任务,因为新的信息往往分散在网页上。在这种情况下,事件的最新摘要将有助于通知用户,并允许他们导航到可能包含相关和新颖细节的文章。我们提出了一个三步的方法来在线新闻跟踪特别的信息需求。首先,我们不断地对所有传入新闻文章的标题进行聚类。然后,我们选择最适合用户特殊信息需求的聚类,并识别突出句子。最后,我们根据所见信息的新颖性和相关性选择句子进行总结,而不需要感兴趣事件的先验模型。我们使用2013年TREC时间摘要测试集评估了这种方法,并表明与现有系统相比,我们的方法检索的新闻事实具有显着更高的F-measure和延迟贴现预期增益。
{"title":"Online News Tracking for Ad-Hoc Information Needs","authors":"Jeroen B. P. Vuurens, A. D. Vries, Roi Blanco, P. Mika","doi":"10.1145/2808194.2809474","DOIUrl":"https://doi.org/10.1145/2808194.2809474","url":null,"abstract":"Following online news about a specific event can be a difficult task as new information is often scattered across web pages. In such cases, an up-to-date summary of the event would help to inform users and allow them to navigate to articles that are likely to contain relevant and novel details. We propose a three-step approach to online news tracking for ad-hoc information needs. First, we continuously cluster the titles of all incoming news articles. Then, we select the clusters that best fit a user's ad-hoc information need and identify salient sentences. Finally, we select sentences for the summary based on novelty and relevance to the information seen, without requiring an a-priori model of events of interest. We evaluate this approach using the 2013 TREC Temporal Summarization test set and show that compared to existing systems our approach retrieves news facts with significantly higher F-measure and Latency-Discounted Expected Gain.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125116250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 2015 International Conference on The Theory of Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1