Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文中文

Privacy-aware Document Ranking with Neural Signals 基于神经信号的隐私感知文档排序

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331189

Jinjin Shao, Shiyu Ji, Tao Yang

The recent work on neural ranking has achieved solid relevance improvement, by exploring similarities between documents and queries using word embeddings. It is an open problem how to leverage such an advancement for privacy-aware ranking, which is important for top K document search on the cloud. Since neural ranking adds more complexity in score computation, it is difficult to prevent the server from discovering embedding-based semantic features and inferring privacy-sensitive information. This paper analyzes the critical leakages in interaction-based neural ranking and studies countermeasures to mitigate such a leakage. It proposes a privacy-aware neural ranking scheme that integrates tree ensembles with kernel value obfuscation and a soft match map based on adaptively-clustered term closures. The paper also presents an evaluation with two TREC datasets on the relevance of the proposed techniques and the trade-offs for privacy and storage efficiency.

通过使用词嵌入来探索文档和查询之间的相似性，最近在神经排序方面的工作已经取得了坚实的相关性改进。如何利用这种进步来进行隐私感知排名是一个悬而未决的问题，这对于在云上搜索top K文档很重要。由于神经排序增加了分数计算的复杂性，很难阻止服务器发现基于嵌入的语义特征并推断隐私敏感信息。本文分析了基于交互的神经网络排序中的关键泄漏，并研究了缓解这种泄漏的对策。提出了一种具有隐私意识的神经排序方案，该方案将树集成与核值混淆和基于自适应聚类术语闭包的软匹配映射相结合。本文还用两个TREC数据集对所提出的技术的相关性以及隐私和存储效率的权衡进行了评估。

引用次数: 8

Nobody Said it Would be Easy: A Decade of R&D Projects in Information Access from Thomson over Reuters to Refinitiv 没有人说这是件容易的事:从汤森(Thomson)、路透社(Reuters)到路孚特(Refinitiv)，信息获取领域的十年研发项目

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331444

Jochen L. Leidner

In this talk, I survey a small, non-random sample of research projects in information access carried out as part of the Thomson Reuters family of companies over the course of a 10+-year period. I analyse into how these projects are similar and different when compared to academic research efforts and attempt a critical (and personal, so certainly subjective) assessment of what academia can do for industry, and what industry can do for research in terms of R&D efforts. I will conclude with some advice for academic-industry collaboration initiatives in several areas of vertical information services (legal, finance, pharma and regulatory/compliance) as well as news.

在这次演讲中，我调查了一个小的、非随机的信息获取研究项目样本，这些项目是作为汤森路透家族公司的一部分，在10多年的时间里进行的。我分析了这些项目与学术研究成果的相似之处和不同之处，并试图对学术界能为行业做些什么以及行业在研发方面能为研究做些什么进行批判性(个人的，当然是主观的)评估。最后，我将对垂直信息服务(法律、金融、制药和监管/合规)以及新闻等几个领域的学术-行业合作倡议提出一些建议。

引用次数: 0

Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation 仇恨语音检测并不像你想象的那么容易:仔细看看模型验证

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331262

Aymé Arango, Jorge Pérez, Bárbara Poblete

Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods.

仇恨言论是一个严重影响在线社会社区动态和有用性的重要问题。大型社交平台目前在自动检测和分类仇恨内容方面投入了重要资源，但收效甚微。另一方面，最先进的系统报告的结果表明，监督方法实现了几乎完美的性能，但仅在特定的数据集中。在这项工作中，我们分析了现有文献与实际应用之间的这种明显矛盾。我们仔细研究了先前工作中使用的实验方法及其在其他数据集上的可泛化性。我们的发现证明了方法上的问题，以及一个重要的数据集偏差。因此，目前最先进技术的性能要求被大大高估了。我们发现的问题主要与数据过拟合和抽样问题有关。我们讨论了对当前研究和重新进行实验的影响，以更准确地了解当前最先进的方法。

{"title":"Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation","authors":"Aymé Arango, Jorge Pérez, Bárbara Poblete","doi":"10.1145/3331184.3331262","DOIUrl":"https://doi.org/10.1145/3331184.3331262","url":null,"abstract":"Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"78 6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72673484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 138

Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks 回顾深度神经网络时代的近似度量优化

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331347

Sebastian Bruch, M. Zoghi, Michael Bendersky, Marc Najork

Learning-to-Rank is a branch of supervised machine learning that seeks to produce an ordering of a list of items such that the utility of the ranked list is maximized. Unlike most machine learning techniques, however, the objective cannot be directly optimized using gradient descent methods as it is either discontinuous or flat everywhere. As such, learning-to-rank methods often optimize a loss function that either is loosely related to or upper-bounds a ranking utility instead. A notable exception is the approximation framework originally proposed by Qin et al. that facilitates a more direct approach to ranking metric optimization. We revisit that framework almost a decade later in light of recent advances in neural networks and demonstrate its superiority empirically. Through this study, we hope to show that the ideas from that work are more relevant than ever and can lay the foundation of learning-to-rank research in the age of deep neural networks.

排序学习是监督式机器学习的一个分支，它寻求产生一个项目列表的排序，从而使排名列表的效用最大化。然而，与大多数机器学习技术不同，目标不能直接使用梯度下降方法进行优化，因为它要么是不连续的，要么是平坦的。因此，学习排序方法通常会优化损失函数，而损失函数要么与排序实用程序松散相关，要么与排名实用程序上界相关。一个值得注意的例外是秦等人最初提出的近似框架，它促进了更直接的排名指标优化方法。近十年后，根据神经网络的最新进展，我们重新审视了这个框架，并从经验上证明了它的优越性。通过这项研究，我们希望表明，来自这项工作的想法比以往任何时候都更具相关性，并可以为深度神经网络时代的学习排序研究奠定基础。

引用次数: 54

Prototype-guided Attribute-wise Interpretable Scheme for Clothing Matching 服装匹配的原型引导属性可解释方案

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331245

Xianjing Han, Xuemeng Song, Jianhua Yin, Yinglong Wang, Liqiang Nie

Recently, as an essential part of people's daily life, clothing matching has gained increasing research attention. Most existing efforts focus on the numerical compatibility modeling between fashion items with advanced neural networks, and hence suffer from the poor interpretation, which makes them less applicable in real world applications. In fact, people prefer to know not only whether the given fashion items are compatible, but also the reasonable interpretations as well as suggestions regarding how to make the incompatible outfit harmonious. Considering that the research line of the comprehensively interpretable clothing matching is largely untapped, in this work, we propose a prototype-guided attribute-wise interpretable compatibility modeling (PAICM) scheme, which seamlessly integrates the latent compatible/incompatible prototype learning and compatibility modeling with the Bayesian personalized ranking (BPR) framework. In particular, the latent attribute interaction prototypes, learned by the non-negative matrix factorization (NMF), are treated as templates to interpret the discordant attribute and suggest the alternative item for each fashion item pair. Extensive experiments on the real-world dataset have demonstrated the effectiveness of our scheme.

近年来，服装搭配作为人们日常生活中必不可少的一部分，越来越受到人们的关注。大多数现有的努力都集中在用高级神经网络对时尚产品之间的数字兼容性建模上，因此存在解释不佳的问题，这使得它们在现实世界的应用中不太适用。事实上，人们不仅想知道给定的时尚单品是否兼容，还想知道如何合理地解释和建议如何使不兼容的服装和谐。考虑到全面可解释服装匹配的研究方向尚未开发，本文提出了一种原型引导的属性可解释兼容性建模(PAICM)方案，该方案将潜在兼容/不兼容原型学习和兼容性建模与贝叶斯个性化排名(BPR)框架无缝集成。特别地，利用非负矩阵分解(NMF)学习到的潜在属性交互原型作为模板来解释不一致的属性，并为每个时尚单品对提供备选单品。在实际数据集上的大量实验证明了我们的方案的有效性。

{"title":"Prototype-guided Attribute-wise Interpretable Scheme for Clothing Matching","authors":"Xianjing Han, Xuemeng Song, Jianhua Yin, Yinglong Wang, Liqiang Nie","doi":"10.1145/3331184.3331245","DOIUrl":"https://doi.org/10.1145/3331184.3331245","url":null,"abstract":"Recently, as an essential part of people's daily life, clothing matching has gained increasing research attention. Most existing efforts focus on the numerical compatibility modeling between fashion items with advanced neural networks, and hence suffer from the poor interpretation, which makes them less applicable in real world applications. In fact, people prefer to know not only whether the given fashion items are compatible, but also the reasonable interpretations as well as suggestions regarding how to make the incompatible outfit harmonious. Considering that the research line of the comprehensively interpretable clothing matching is largely untapped, in this work, we propose a prototype-guided attribute-wise interpretable compatibility modeling (PAICM) scheme, which seamlessly integrates the latent compatible/incompatible prototype learning and compatibility modeling with the Bayesian personalized ranking (BPR) framework. In particular, the latent attribute interaction prototypes, learned by the non-negative matrix factorization (NMF), are treated as templates to interpret the discordant attribute and suggest the alternative item for each fashion item pair. Extensive experiments on the real-world dataset have demonstrated the effectiveness of our scheme.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78496916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

WCIS 2019: 1st Workshop on Conversational Interaction Systems WCIS 2019:第一届会话交互系统研讨会

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331648

Abhinav Rastogi, A. Papangelis, Rahul Goel, Chandra Khatri

The first workshop on Conversational Interaction Systems is held in Paris, France on July 25th, 2019, co-located with the ACM Special Interest Group on Information Retrieval (SIGIR). The goal of the workshop is to bring together researchers from academia and industry to discuss the challenges and future of conversational agents and interactive systems. The workshop has an exciting program that spans a number of subareas including: multi-modal conversational interfaces, dialogue accessibility, and scaling such systems. The program includes eight invited talks, a lively panel discussion on emerging topics, and presentation of original research papers.

首届对话交互系统研讨会于2019年7月25日在法国巴黎举行，与ACM信息检索特别兴趣小组(SIGIR)共同举办。研讨会的目标是将学术界和工业界的研究人员聚集在一起，讨论对话代理和交互系统的挑战和未来。研讨会有一个令人兴奋的计划，涵盖了许多子领域，包括:多模态会话接口，对话可访问性和扩展这样的系统。该计划包括8个特邀演讲，一个关于新兴话题的生动小组讨论，以及原创研究论文的展示。

引用次数: 1

Multimodal Data Fusion with Quantum Inspiration 多模态数据融合与量子启发

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331419

Qiuchi Li

Language understanding is multimodal. During human communication, messages are conveyed not only by words in textual form, but also through speech patterns, gestures or facial emotions of the speakers. Therefore, it is crucial to fuse information from different modalities to achieve a joint comprehension. With the rapid progress in the deep learning field, neural networks have emerged as the most popular approach for addressing multimodal data fusion [1, 6, 7, 12]. While these models can effectively combine multimodal features by learning from data, they nevertheless lack an explicit exhibition of how different modalities are related to each other, due to the inherent low interpretability of neural networks [2]. In the meantime, Quantum Theory (QT) has given rise to principled approaches for incorporating interactions between textual features into a holistic textual representation [3, 5, 8, 10], where the concepts of superposition andentanglement have been universally exploited to formulate interactions. The advantages of those models in capturing complicated correlations between textual features have been observed. We hereby propose the research on quantum-inspired multimodal data fusion, claiming that the limitation of multimodal data fusion can be tackled by quantum-driven models. In particular, we propose to employ superposition to formulate intra-modal interactions while the interplay between different modalities is expected to be captured by entanglement measures. By doing so, the interactions within multimodal data may be rendered explicitly in a unified quantum formalism, increasing the performance and interpretability for concrete multimodal tasks. It will also expand the application domains of quantum theory to multimodal tasks where only preliminary efforts have been made [11]. We therefore aim at answering the following research question: RQ. Can we fuse multimodal data with quantum-inspired models? To answer this question, we propose to fuse multimodal data with complex-valued neural networks, motivated by the theoretical link between neural networks and quantum theory [4] and advances in complex-valued neural networks [9]. Our model begins with a separate complex-valued embedding learned for each unimodal data based on the existing works [5, 10] which inherently assumes superposition between intra-modal features. Then we construct a many-body system in entangled state for multimodal data, where cross-modality interactions are naturally reflected by entanglement measures. Quantum measurement operators are applied to the entanglement state to address a concrete multimodal task at hand. The whole process is instrumented by a complex-valued neural network, which is able to learn how multimodal features are combined from data, and at the same time explain the combination by means of quantum superposition and entanglement measures. We plan to examine our proposed models on CMU-MOSI [12] and CMU-MOSEI [1] which are benchmarking multimodal

语言理解是多模态的。在人类的交流过程中，信息不仅通过文本形式的词语来传递，还通过说话人的语言模式、手势或面部表情来传递。因此，融合不同形式的信息以达到共同理解是至关重要的。随着深度学习领域的快速发展，神经网络已经成为解决多模态数据融合的最流行的方法[1,6,7,12]。虽然这些模型可以通过从数据中学习有效地结合多模态特征，但由于神经网络固有的低可解释性，它们缺乏对不同模态之间如何相互关联的明确展示[2]。与此同时，量子理论(QT)提出了将文本特征之间的相互作用纳入整体文本表示的原则方法[3,5,8,10]，其中叠加和纠缠的概念已被普遍利用来制定相互作用。这些模型在捕获文本特征之间复杂关联方面的优势已经被观察到。在此，我们提出了量子启发的多模态数据融合研究，并声称可以通过量子驱动模型来解决多模态数据融合的局限性。特别是，我们建议采用叠加来制定模态内相互作用，而不同模态之间的相互作用有望通过纠缠测量来捕获。通过这样做，多模态数据中的相互作用可以以统一的量子形式显式呈现，从而提高了具体多模态任务的性能和可解释性。它还将把量子理论的应用领域扩展到多模态任务，而这只是初步的努力[11]。因此，我们旨在回答以下研究问题:RQ。我们能将多模态数据与量子启发的模型融合吗?为了回答这个问题，受神经网络与量子理论之间的理论联系[4]和复值神经网络的进展[9]的启发，我们提出将多模态数据与复值神经网络融合在一起。我们的模型首先基于现有工作[5,10]为每个单峰数据学习一个单独的复值嵌入，该嵌入固有地假设了模态内特征之间的叠加。然后，我们对多模态数据构建了一个纠缠态的多体系统，其中跨模态的相互作用通过纠缠度量自然地反映出来。量子测量算子应用于纠缠态，以解决手头的具体多模态任务。整个过程采用复值神经网络，该网络能够从数据中学习到多模态特征是如何组合的，同时通过量子叠加和纠缠度量来解释这种组合。我们计划在CMU-MOSI[12]和CMU-MOSEI[1]上检验我们提出的模型，这是对多模态情感分析数据集进行基准测试。该数据集的目标是将情感分为2、5或7类，并输入文本、视觉和声学特征。我们期望看到与最先进的模型相当的有效性，我们将探索叠加和纠缠措施，以更好地理解多式联运相互作用。

{"title":"Multimodal Data Fusion with Quantum Inspiration","authors":"Qiuchi Li","doi":"10.1145/3331184.3331419","DOIUrl":"https://doi.org/10.1145/3331184.3331419","url":null,"abstract":"Language understanding is multimodal. During human communication, messages are conveyed not only by words in textual form, but also through speech patterns, gestures or facial emotions of the speakers. Therefore, it is crucial to fuse information from different modalities to achieve a joint comprehension. With the rapid progress in the deep learning field, neural networks have emerged as the most popular approach for addressing multimodal data fusion [1, 6, 7, 12]. While these models can effectively combine multimodal features by learning from data, they nevertheless lack an explicit exhibition of how different modalities are related to each other, due to the inherent low interpretability of neural networks [2]. In the meantime, Quantum Theory (QT) has given rise to principled approaches for incorporating interactions between textual features into a holistic textual representation [3, 5, 8, 10], where the concepts of superposition andentanglement have been universally exploited to formulate interactions. The advantages of those models in capturing complicated correlations between textual features have been observed. We hereby propose the research on quantum-inspired multimodal data fusion, claiming that the limitation of multimodal data fusion can be tackled by quantum-driven models. In particular, we propose to employ superposition to formulate intra-modal interactions while the interplay between different modalities is expected to be captured by entanglement measures. By doing so, the interactions within multimodal data may be rendered explicitly in a unified quantum formalism, increasing the performance and interpretability for concrete multimodal tasks. It will also expand the application domains of quantum theory to multimodal tasks where only preliminary efforts have been made [11]. We therefore aim at answering the following research question: RQ. Can we fuse multimodal data with quantum-inspired models? To answer this question, we propose to fuse multimodal data with complex-valued neural networks, motivated by the theoretical link between neural networks and quantum theory [4] and advances in complex-valued neural networks [9]. Our model begins with a separate complex-valued embedding learned for each unimodal data based on the existing works [5, 10] which inherently assumes superposition between intra-modal features. Then we construct a many-body system in entangled state for multimodal data, where cross-modality interactions are naturally reflected by entanglement measures. Quantum measurement operators are applied to the entanglement state to address a concrete multimodal task at hand. The whole process is instrumented by a complex-valued neural network, which is able to learn how multimodal features are combined from data, and at the same time explain the combination by means of quantum superposition and entanglement measures. We plan to examine our proposed models on CMU-MOSI [12] and CMU-MOSEI [1] which are benchmarking multimodal ","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80844070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Demonstrating Requirement Search on a University Degree Search Application 在大学学位搜索应用程序中演示需求搜索

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331402

Nicholas Mendez, Kyle De Freitas, Inzamam Rahaman

In many domains of information retrieval, we are required to retrieve documents that describe requirements on a predefined set of terms. A requirement is a relationship between a set of terms and the document. As requirements become more complex by catering for optional, alternative, and combinations of terms, efficiently retrieving documents becomes more challenging due to the exponential size of the search space. In this paper, we propose RevBoMIR, which utilizes a modified Boolean Model for Information Retrieval to retrieve requirements-based documents without sacrificing the expressiveness of requirements. Our proposed approach is particularly useful in domains where documents embed criteria that can be satisfied by mandatory, alternative or disqualifying terms to determine its retrieval. Finally, we present a graph model for representing document requirements, and demonstrate Requirement Search via a university degree search application.

在信息检索的许多领域中，我们需要检索描述预定义术语集上需求的文档。需求是一组术语和文档之间的关系。由于满足可选、可选和术语组合的需求变得更加复杂，由于搜索空间的指数级大小，有效检索文档变得更加具有挑战性。在本文中，我们提出了RevBoMIR，它利用一种改进的布尔模型进行信息检索，在不牺牲需求表达性的情况下检索基于需求的文档。我们提出的方法在文档嵌入标准的领域特别有用，这些标准可以通过强制、替代或不合格的术语来满足，以确定其检索。最后，我们提出了一个表示文档需求的图形模型，并通过一个大学学位搜索应用程序演示了需求搜索。

引用次数: 0

A Horizontal Patent Test Collection 横向专利测试集

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331346

M. Lupu, A. Bampoulidis, L. Papariello

We motivate the need for, and describe the contents of a novel patent research collection, publicly available and for free, covering multimodal and multilingual data from six patent authorities. The new patent test collection complements existing patent test collections, which are vertical (one domain or one authority over many years). Instead, the new collection is horizontal: it includes all technical domains from the major patenting authorities over the relatively short time span of two years. In addition to bringing together documents currently scattered across different test collections, the collection provides, for the first time, Korean documents, to complement those from Europe, US, Japan, and China. This new collection can be used on a variety of tasks beyond traditional information retrieval. We exemplify this with a task of high-relevance today: de-anonymisation.

我们激发了对新专利研究集合的需求，并描述了其内容，该集合公开且免费，涵盖了来自六个专利机构的多模式和多语言数据。新的专利测试集补充了现有的专利测试集，现有的专利测试集是垂直的(一个领域或一个权威机构多年)。相反，新的收集是横向的:它包括了主要专利机构在相对较短的两年时间内的所有技术领域。除了汇集目前分散在不同测试集合中的文档之外，该集合还首次提供了韩国文档，以补充来自欧洲、美国、日本和中国的文档。这个新集合可以用于传统信息检索之外的各种任务。我们用一个高度相关的任务来说明这一点:去匿名化。

引用次数: 0

Evaluating Risk-Sensitive Text Retrieval 评估风险敏感文本检索

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2019-07-18 DOI: 10.1145/3331184.3331423

R. Benham

Search engines with a loyal user-base face the difficult task of improving overall effectiveness while maintaining the quality of existing work-flows. Risk-sensitive evaluation tools are designed to address that task, but, they currently do not support inference over multiple baselines. Our research objectives are to: 1) Survey and revisit risk evaluation, taking into account frequentist and Bayesian inference approaches for comparing against multiple baselines; 2) Apply that new approach, evaluating a novel web search technique that leverages previously run queries to improve the effectiveness of a new user query; and 3) Explore how risk-sensitive component interactions affect end-to-end effectiveness in a search pipeline.

拥有忠实用户基础的搜索引擎面临着在保持现有工作流质量的同时提高整体效率的艰巨任务。对风险敏感的评估工具被设计用来处理这个任务，但是，它们目前不支持对多个基线的推断。我们的研究目标是:1)调查和重新审视风险评估，考虑频率论和贝叶斯推理方法，与多个基线进行比较;2)应用这种新方法，评估一种新的网络搜索技术，该技术利用以前运行的查询来提高新用户查询的有效性;3)探索风险敏感组件交互如何影响搜索管道中的端到端有效性。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀