首页 > 最新文献

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Tutorial on Task-Based Search and Assistance 基于任务的搜索和辅助教程
C. Shah, Ryen W. White
While great strides are made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has opened up new modalities for interacting with information, but these agents will need to be able to work more intelligently in understanding the context and helping the users at task level. This tutorial will introduce the attendees to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). Specifically, it will cover several recent theories, models, and methods that show how to represent tasks and use behavioral data to extract task information. It will then show how this knowledge or model could contribute to addressing emerging retrieval and recommendation problems.
虽然在搜索和推荐领域取得了巨大的进步,但是在解决信息访问问题方面仍然存在挑战和机遇,这些问题涉及到为各种各样的用户解决任务和实现目标。具体来说,我们缺乏智能系统,不仅可以检测到个人正在提出的请求(什么),还可以在提供信息时理解和利用意图(为什么)和策略(如何)。信息检索、推荐系统、生产力(特别是任务管理和时间管理)和人工智能领域的许多学者已经认识到提取和理解人们的任务以及执行这些任务背后的意图的重要性,以便更好地为他们服务。然而,我们仍然在努力支持它们完成任务,例如,在搜索和帮助方面,超越单查询或单轮交互一直是一个挑战。智能代理的激增为与信息交互开辟了新的模式,但这些代理需要能够更智能地工作,以理解上下文并在任务级别上帮助用户。本教程将向与会者介绍在信息集(有或没有主动搜索)中检测、理解和使用任务和与任务相关的信息的问题。具体来说,它将涵盖几个最新的理论、模型和方法,这些理论、模型和方法展示了如何表示任务并使用行为数据来提取任务信息。然后,它将展示这些知识或模型如何有助于解决新出现的检索和推荐问题。
{"title":"Tutorial on Task-Based Search and Assistance","authors":"C. Shah, Ryen W. White","doi":"10.1145/3397271.3401422","DOIUrl":"https://doi.org/10.1145/3397271.3401422","url":null,"abstract":"While great strides are made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has opened up new modalities for interacting with information, but these agents will need to be able to work more intelligently in understanding the context and helping the users at task level. This tutorial will introduce the attendees to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). Specifically, it will cover several recent theories, models, and methods that show how to represent tasks and use behavioral data to extract task information. It will then show how this knowledge or model could contribute to addressing emerging retrieval and recommendation problems.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"78 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126050453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Metadata Matters in User Engagement Prediction 元数据在用户粘性预测中很重要
Xiang Chen, Saayan Mitra, Viswanathan Swaminathan
Predicting user engagement (e.g., click-through rate, conversion rate) on the display ads plays a critical role in delivering the right ad to the right user in online advertising. Existing techniques spanning Logistic Regression to Factorization Machines and their derivatives, focus on modeling the interactions among handcrafted features to predict the user engagement. Little attention has been paid on how the ad fits with the context (e.g., hosted webpage, user demographics). In this paper, we propose to include the metadata feature, which captures the visual appearance of the ad, in the user engagement prediction task. In particular, given a data sample, we combine both the basic context features, which have been widely used in existing prediction models, and the metadata feature, which is extracted from the ad using a state-of-the-art deep learning framework, to predict user engagement. To demonstrate the effectiveness of the proposed metadata feature, we compare the performance of the widely used prediction models before and after integrating the metadata feature. Our experimental results on a real-world dataset demonstrate that the metadata feature is able to further improve the prediction performance.
预测显示广告的用户参与度(例如,点击率,转化率)对于将正确的广告传递给正确的用户起着至关重要的作用。现有的技术从逻辑回归到因子分解机及其衍生产品,专注于对手工制作的功能之间的交互建模,以预测用户参与度。很少关注广告如何与上下文(例如,托管网页,用户人口统计)相匹配。在本文中,我们建议在用户参与度预测任务中包含捕获广告视觉外观的元数据特征。特别是,给定一个数据样本,我们结合了在现有预测模型中广泛使用的基本上下文特征和使用最先进的深度学习框架从广告中提取的元数据特征来预测用户参与度。为了验证所提出的元数据特征的有效性,我们比较了集成元数据特征前后广泛使用的预测模型的性能。我们在一个真实数据集上的实验结果表明,元数据特征能够进一步提高预测性能。
{"title":"Metadata Matters in User Engagement Prediction","authors":"Xiang Chen, Saayan Mitra, Viswanathan Swaminathan","doi":"10.1145/3397271.3401201","DOIUrl":"https://doi.org/10.1145/3397271.3401201","url":null,"abstract":"Predicting user engagement (e.g., click-through rate, conversion rate) on the display ads plays a critical role in delivering the right ad to the right user in online advertising. Existing techniques spanning Logistic Regression to Factorization Machines and their derivatives, focus on modeling the interactions among handcrafted features to predict the user engagement. Little attention has been paid on how the ad fits with the context (e.g., hosted webpage, user demographics). In this paper, we propose to include the metadata feature, which captures the visual appearance of the ad, in the user engagement prediction task. In particular, given a data sample, we combine both the basic context features, which have been widely used in existing prediction models, and the metadata feature, which is extracted from the ad using a state-of-the-art deep learning framework, to predict user engagement. To demonstrate the effectiveness of the proposed metadata feature, we compare the performance of the widely used prediction models before and after integrating the metadata feature. Our experimental results on a real-world dataset demonstrate that the metadata feature is able to further improve the prediction performance.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129360093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Residual-Duet Network with Tree Dependency Representation for Chinese Question-Answering Sentiment Analysis 基于树依赖表示的残差二重网络中文问答情感分析
Guangyi Hu, Chongyang Shi, Shufeng Hao, Yunru Bai
Question-answering sentiment analysis (QASA) is a novel but meaningful sentiment analysis task based on question-answering online reviews. Existing neural network-based models that conduct sentiment analysis of online reviews have already achieved great success. However, the syntax and implicitly semantic connection in the dependency tree have not been made full use of, especially for Chinese which has specific syntax. In this work, we propose a Residual-Duet Network leveraging textual and tree dependency information for Chinese question-answering sentiment analysis. In particular, we explore the synergies of graph embedding with structural dependency links to learn syntactic information. The transverse and longitudinal compression encoders are developed to capture sentiment evidence with disparate types of compression and different residual connections. We evaluate our model on three Chinese QASA datasets in different domains. Experimental results demonstrate the superiority of our proposed model in Chinese question-answering sentiment analysis.
问答式情感分析(QASA)是一种新颖而有意义的基于问答式在线评论的情感分析任务。现有的基于神经网络的在线评论情感分析模型已经取得了巨大的成功。然而,依赖树中的语法和隐含语义连接并没有得到充分利用,特别是对于具有特定语法的汉语。在这项工作中,我们提出了一种利用文本和树依赖信息的残差二元网络,用于中文问答情感分析。特别是,我们探索了图嵌入与结构依赖链接的协同作用,以学习语法信息。开发了横向和纵向压缩编码器,以捕获具有不同类型压缩和不同残余连接的情感证据。我们在三个不同领域的中国QASA数据集上评估了我们的模型。实验结果证明了该模型在汉语问答情感分析中的优越性。
{"title":"Residual-Duet Network with Tree Dependency Representation for Chinese Question-Answering Sentiment Analysis","authors":"Guangyi Hu, Chongyang Shi, Shufeng Hao, Yunru Bai","doi":"10.1145/3397271.3401226","DOIUrl":"https://doi.org/10.1145/3397271.3401226","url":null,"abstract":"Question-answering sentiment analysis (QASA) is a novel but meaningful sentiment analysis task based on question-answering online reviews. Existing neural network-based models that conduct sentiment analysis of online reviews have already achieved great success. However, the syntax and implicitly semantic connection in the dependency tree have not been made full use of, especially for Chinese which has specific syntax. In this work, we propose a Residual-Duet Network leveraging textual and tree dependency information for Chinese question-answering sentiment analysis. In particular, we explore the synergies of graph embedding with structural dependency links to learn syntactic information. The transverse and longitudinal compression encoders are developed to capture sentiment evidence with disparate types of compression and different residual connections. We evaluate our model on three Chinese QASA datasets in different domains. Experimental results demonstrate the superiority of our proposed model in Chinese question-answering sentiment analysis.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129214909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
APS: An Active PubMed Search System for Technology Assisted Reviews APS:技术辅助评论的主动PubMed搜索系统
Dan Li, Panagiotis Zafeiriadis, E. Kanoulas
Systematic reviews constitute the cornerstone of Evidence-based Medicine. They can provide guidance to medical policy-making by synthesizing all available studies regarding a certain topic. However, conducting systematic reviews has become a laborious and time-consuming task due to the large amount and rapid growth of published literature. The TAR approaches aim to accelerate the screening stage of systematic reviews by combining machine learning algorithms and human relevance feedback. In this work, we built an online active search system for systematic reviews, named APS, by applying an state-of-the-art TAR approach -- Continuous Active Learning. The system is built on the top of the PubMed collection, which is a widely used database of biomedical literature. It allows users to conduct the abstract screening for systematic reviews. We demonstrate the effectiveness and robustness of the APS in detecting relevant literature and reducing workload for systematic reviews using the CLEF TAR 2017 benchmark.
系统评价是循证医学的基石。他们可以通过综合有关某一主题的所有现有研究,为医疗决策提供指导。然而,由于已发表的文献数量庞大且增长迅速,进行系统评价已成为一项费力且耗时的任务。TAR方法旨在通过结合机器学习算法和人类相关性反馈来加速系统评论的筛选阶段。在这项工作中,我们通过应用最先进的TAR方法——持续主动学习,为系统评论建立了一个在线主动搜索系统,名为APS。该系统建立在PubMed collection的基础上,PubMed collection是一个广泛使用的生物医学文献数据库。它允许用户进行系统审查的抽象筛选。我们使用CLEF TAR 2017基准证明了APS在检测相关文献和减少系统评价工作量方面的有效性和稳健性。
{"title":"APS: An Active PubMed Search System for Technology Assisted Reviews","authors":"Dan Li, Panagiotis Zafeiriadis, E. Kanoulas","doi":"10.1145/3397271.3401401","DOIUrl":"https://doi.org/10.1145/3397271.3401401","url":null,"abstract":"Systematic reviews constitute the cornerstone of Evidence-based Medicine. They can provide guidance to medical policy-making by synthesizing all available studies regarding a certain topic. However, conducting systematic reviews has become a laborious and time-consuming task due to the large amount and rapid growth of published literature. The TAR approaches aim to accelerate the screening stage of systematic reviews by combining machine learning algorithms and human relevance feedback. In this work, we built an online active search system for systematic reviews, named APS, by applying an state-of-the-art TAR approach -- Continuous Active Learning. The system is built on the top of the PubMed collection, which is a widely used database of biomedical literature. It allows users to conduct the abstract screening for systematic reviews. We demonstrate the effectiveness and robustness of the APS in detecting relevant literature and reducing workload for systematic reviews using the CLEF TAR 2017 benchmark.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Study of Methods for the Generation of Domain-Aware Word Embeddings 领域感知词嵌入的生成方法研究
Dominic Seyler, Chengxiang Zhai
Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.
词嵌入是许多文本数据应用程序的基本组件。在大多数工作中,在一般文本语料库上训练的“开箱即用”嵌入被使用,但是当应用于特定领域的设置时,它们可能不太有效。因此,如何创建“领域感知”的词嵌入是一个有趣的开放性研究问题。本文研究了基于通用文本语料库和特定文本语料库的三种领域感知词嵌入方法,包括嵌入向量的拼接、文本数据的加权融合和对齐嵌入向量的插值。尽管所研究的策略是为特定于领域的任务量身定制的,但它们足够通用,可以应用于任何领域,而不是特定于单个任务。实验结果表明,三种方法均能取得较好的效果,但插值方法的效果始终最好。
{"title":"A Study of Methods for the Generation of Domain-Aware Word Embeddings","authors":"Dominic Seyler, Chengxiang Zhai","doi":"10.1145/3397271.3401287","DOIUrl":"https://doi.org/10.1145/3397271.3401287","url":null,"abstract":"Word embeddings are essential components for many text data applications. In most work, \"out-of-the-box\" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create \"domain-aware\" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128862062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval 基于联合模态分布的大规模无监督深度跨模态检索相似性哈希
Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, Long Ying
Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-modal hashing methods have shown promising results. However, existing approaches typically suffer from two limitations: (1) They usually learn cross-modal similarity information separately or in a redundant fusion manner, which may fail to capture semantic correlations among instances from different modalities sufficiently and effectively. (2) They seldom consider the sampling and weighting schemes for unsupervised cross-modal hashing, resulting in the lack of satisfactory discriminative ability in hash codes. To overcome these limitations, we propose a novel unsupervised deep cross-modal hashing method called Joint-modal Distribution-based Similarity Hashing (JDSH) for large-scale cross-modal retrieval. Firstly, we propose a novel cross-modal joint-training method by constructing a joint-modal similarity matrix to fully preserve the cross-modal semantic correlations among instances. Secondly, we propose a sampling and weighting scheme termed the Distribution-based Similarity Decision and Weighting (DSDW) method for unsupervised cross-modal hashing, which is able to generate more discriminative hash codes by pushing semantic similar instance pairs closer and pulling semantic dissimilar instance pairs apart. The experimental results demonstrate the superiority of JDSH compared with several unsupervised cross-modal hashing methods on two public datasets NUS-WIDE and MIRFlickr.
基于哈希的跨模态搜索以多模态特征映射到二进制码中为目标,其存储和搜索效率越来越受到人们的关注,特别是在大规模数据库检索中。最近的无监督深度跨模态哈希方法已经显示出有希望的结果。然而,现有的方法通常存在两个局限性:(1)它们通常单独或以冗余融合的方式学习跨模态相似性信息,可能无法充分有效地捕获不同模态实例之间的语义相关性。(2)对于无监督跨模态哈希,他们很少考虑采样和加权方案,导致哈希码缺乏令人满意的判别能力。为了克服这些限制,我们提出了一种新的无监督深度跨模态哈希方法,称为基于联合模态分布的相似性哈希(JDSH),用于大规模跨模态检索。首先,我们提出了一种新的跨模态联合训练方法,通过构造一个联合模态相似矩阵来充分保持实例间的跨模态语义相关性。其次,针对无监督跨模态哈希,提出了一种基于分布的相似性决策和加权(DSDW)方法,该方法通过将语义相似的实例对推得更近,将语义不相似的实例对拉得更远,从而产生更多的判别哈希码。实验结果表明,在NUS-WIDE和MIRFlickr两个公共数据集上,JDSH比几种无监督跨模态哈希方法更具有优越性。
{"title":"Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval","authors":"Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, Long Ying","doi":"10.1145/3397271.3401086","DOIUrl":"https://doi.org/10.1145/3397271.3401086","url":null,"abstract":"Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-modal hashing methods have shown promising results. However, existing approaches typically suffer from two limitations: (1) They usually learn cross-modal similarity information separately or in a redundant fusion manner, which may fail to capture semantic correlations among instances from different modalities sufficiently and effectively. (2) They seldom consider the sampling and weighting schemes for unsupervised cross-modal hashing, resulting in the lack of satisfactory discriminative ability in hash codes. To overcome these limitations, we propose a novel unsupervised deep cross-modal hashing method called Joint-modal Distribution-based Similarity Hashing (JDSH) for large-scale cross-modal retrieval. Firstly, we propose a novel cross-modal joint-training method by constructing a joint-modal similarity matrix to fully preserve the cross-modal semantic correlations among instances. Secondly, we propose a sampling and weighting scheme termed the Distribution-based Similarity Decision and Weighting (DSDW) method for unsupervised cross-modal hashing, which is able to generate more discriminative hash codes by pushing semantic similar instance pairs closer and pulling semantic dissimilar instance pairs apart. The experimental results demonstrate the superiority of JDSH compared with several unsupervised cross-modal hashing methods on two public datasets NUS-WIDE and MIRFlickr.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121951011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval 非线性鲁棒离散散列跨模态检索
Zhan Yang, J. Long, Lei Zhu, Wenti Huang
Hashing techniques have recently been successfully applied to solve similarity search problems in the information retrieval field because of their significantly reduced storage and high-speed search capabilities. However, the hash codes learned from most recent cross-modal hashing methods lack the ability to comprehensively preserve adequate information, resulting in a less than desirable performance. To solve this limitation, we propose a novel method termed Nonlinear Robust Discrete Hashing (NRDH), for cross-modal retrieval. The main idea behind NRDH is motivated by the success of neural networks, i.e., nonlinear descriptors, in the field of representation learning, and the use of nonlinear descriptors instead of simple linear transformations is more in line with the complex relationships that exist between common latent representation and heterogeneous multimedia data in the real world. In NRDH, we first learn a common latent representation through nonlinear descriptors to encode complementary and consistent information from the features of the heterogeneous multimedia data. Moreover, an asymmetric learning scheme is proposed to correlate the learned hash codes with the common latent representation. Empirically, we demonstrate that NRDH is able to successfully generate a comprehensive common latent representation that significantly improves the quality of the learned hash codes. Then, NRDH adopts a linear learning strategy to fast learn the hash function with the learned hash codes. Extensive experiments performed on two benchmark datasets highlight the superiority of NRDH over several state-of-the-art methods.
近年来,哈希技术因其显著降低存储容量和高速搜索能力而被成功地应用于解决信息检索领域的相似性搜索问题。然而,从最近的跨模态哈希方法中学习到的哈希码缺乏全面保存足够信息的能力,导致性能不理想。为了解决这一限制,我们提出了一种新的方法,称为非线性鲁棒离散哈希(NRDH),用于跨模态检索。NRDH背后的主要思想源于神经网络(即非线性描述符)在表征学习领域的成功,使用非线性描述符代替简单的线性变换更符合现实世界中常见潜在表征与异构多媒体数据之间存在的复杂关系。在NRDH中,我们首先通过非线性描述符学习一个共同的潜在表示,从异构多媒体数据的特征中编码互补和一致的信息。此外,提出了一种非对称学习方案,将学习到的哈希码与公共潜在表示相关联。经验上,我们证明了NRDH能够成功地生成一个全面的共同潜在表示,显著提高了学习到的哈希码的质量。然后,NRDH采用线性学习策略,利用学习到的哈希码快速学习哈希函数。在两个基准数据集上进行的大量实验突出了NRDH优于几种最先进的方法。
{"title":"Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval","authors":"Zhan Yang, J. Long, Lei Zhu, Wenti Huang","doi":"10.1145/3397271.3401152","DOIUrl":"https://doi.org/10.1145/3397271.3401152","url":null,"abstract":"Hashing techniques have recently been successfully applied to solve similarity search problems in the information retrieval field because of their significantly reduced storage and high-speed search capabilities. However, the hash codes learned from most recent cross-modal hashing methods lack the ability to comprehensively preserve adequate information, resulting in a less than desirable performance. To solve this limitation, we propose a novel method termed Nonlinear Robust Discrete Hashing (NRDH), for cross-modal retrieval. The main idea behind NRDH is motivated by the success of neural networks, i.e., nonlinear descriptors, in the field of representation learning, and the use of nonlinear descriptors instead of simple linear transformations is more in line with the complex relationships that exist between common latent representation and heterogeneous multimedia data in the real world. In NRDH, we first learn a common latent representation through nonlinear descriptors to encode complementary and consistent information from the features of the heterogeneous multimedia data. Moreover, an asymmetric learning scheme is proposed to correlate the learned hash codes with the common latent representation. Empirically, we demonstrate that NRDH is able to successfully generate a comprehensive common latent representation that significantly improves the quality of the learned hash codes. Then, NRDH adopts a linear learning strategy to fast learn the hash function with the learned hash codes. Extensive experiments performed on two benchmark datasets highlight the superiority of NRDH over several state-of-the-art methods.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"42 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120995368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Query Rewriting for Voice Shopping Null Queries 语音购物空查询的查询重写
Iftah Gamzu, Marina Haikin, N. Halabi
Voice shopping using natural language introduces new challenges related to customer queries, like handling mispronounced, misexpressed, and misunderstood queries. Voice null queries, which result in no offers, have negative impact on customers shopping experience. Query rewriting (QR) attempts to automatically replace null queries with alternatives that lead to relevant results. We present a new approach for pre-retrieval QR of voice shopping null queries. Our proposed QR framework first generates alternative queries using a search index-based approach that targets different potential failures in voice queries. Then, a machine-learning component ranks these alternatives, and the original query is amended by the selected alternative. We provide an experimental evaluation of our approach based on data logs of a commercial voice assistant and an e-commerce website, demonstrating that it outperforms several baselines by more than $22%$. Our evaluation also highlights an interesting phenomenon, showing that web shopping null queries are considerably different, and apparently easier to fix, than voice queries. This further substantiates the use of specialized mechanisms for the voice domain. We believe that our proposed framework, mapping tail queries to head queries, is of independent interest since it can be extended and applied to other domains.
使用自然语言的语音购物带来了与客户查询相关的新挑战,比如处理发音错误、表达错误和误解的查询。语音空查询,导致没有优惠,对客户的购物体验产生负面影响。查询重写(QR)尝试用产生相关结果的替代方法自动替换空查询。提出了一种基于语音购物空查询的预检索QR算法。我们提出的QR框架首先使用基于搜索索引的方法生成替代查询,该方法针对语音查询中的不同潜在故障。然后,机器学习组件对这些备选项进行排序,并由选定的备选项修改原始查询。我们基于商业语音助手和电子商务网站的数据日志对我们的方法进行了实验评估,证明它比几个基线高出22%以上。我们的评估还突出了一个有趣的现象,表明网络购物空查询与语音查询有很大的不同,而且显然更容易修复。这进一步证实了语音域专用机制的使用。我们认为,我们提出的框架,将尾查询映射到头查询,是独立的兴趣,因为它可以扩展和应用到其他领域。
{"title":"Query Rewriting for Voice Shopping Null Queries","authors":"Iftah Gamzu, Marina Haikin, N. Halabi","doi":"10.1145/3397271.3401052","DOIUrl":"https://doi.org/10.1145/3397271.3401052","url":null,"abstract":"Voice shopping using natural language introduces new challenges related to customer queries, like handling mispronounced, misexpressed, and misunderstood queries. Voice null queries, which result in no offers, have negative impact on customers shopping experience. Query rewriting (QR) attempts to automatically replace null queries with alternatives that lead to relevant results. We present a new approach for pre-retrieval QR of voice shopping null queries. Our proposed QR framework first generates alternative queries using a search index-based approach that targets different potential failures in voice queries. Then, a machine-learning component ranks these alternatives, and the original query is amended by the selected alternative. We provide an experimental evaluation of our approach based on data logs of a commercial voice assistant and an e-commerce website, demonstrating that it outperforms several baselines by more than $22%$. Our evaluation also highlights an interesting phenomenon, showing that web shopping null queries are considerably different, and apparently easier to fix, than voice queries. This further substantiates the use of specialized mechanisms for the voice domain. We believe that our proposed framework, mapping tail queries to head queries, is of independent interest since it can be extended and applied to other domains.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126588178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning 基于交叉图成对学习的稀疏域用户群体检测
Zheng Gao, Hongsong Li, Zhuoren Jiang, Xiaozhong Liu
Cyberspace hosts abundant interactions between users and different kinds of objects, and their relations are often encapsulated as bipartite graphs. Detecting user community in such heterogeneous graphs is an essential task to uncover user information needs and to further enhance recommendation performance. While several main cyber domains carrying high-quality graphs, unfortunately, most others can be quite sparse. However, as users may appear in multiple domains (graphs), their high-quality activities in the main domains can supply community detection in the sparse ones, e.g., user behaviors on Google can help thousands of applications to locate his/her local community when s/he uses Google ID to login those applications. In this paper, our model, Pairwise Cross-graph Community Detection (PCCD), is proposed to cope with the sparse graph problem by involving external graph knowledge to learn user pairwise community closeness instead of detecting direct communities. Particularly in our model, to avoid taking excessive propagated information, a two-level filtering module is utilized to select the most informative connections through both community and node level filters. Subsequently, a Community Recurrent Unit (CRU) is designed to estimate pairwise user community closeness. Extensive experiments on two real-world graph datasets validate our model against several strong alternatives. Supplementary experiments also validate its robustness on graphs with varied sparsity scales.
网络空间承载着用户与各种对象之间丰富的交互,它们之间的关系往往被封装为二部图。在这种异构图中检测用户社区是发现用户信息需求和进一步提高推荐性能的基本任务。虽然有几个主要的网络域携带高质量的图表,但不幸的是,大多数其他的网络域可能相当稀疏。然而,由于用户可能出现在多个域(图)中,他们在主域中的高质量活动可以在稀疏域中提供社区检测,例如,当用户使用Google ID登录这些应用程序时,用户在Google上的行为可以帮助成千上万的应用程序定位他/她的本地社区。在本文中,我们提出了PCCD (Pairwise Cross-graph Community Detection)模型来解决稀疏图问题,通过引入外部图知识来学习用户成对的社区亲密度,而不是直接检测社区。特别是在我们的模型中,为了避免获取过多的传播信息,我们使用了一个两级过滤模块,通过社区级和节点级过滤来选择信息量最大的连接。随后,设计了一个社区循环单元(CRU)来估计两两用户社区亲密度。在两个真实世界的图形数据集上进行的大量实验验证了我们的模型与几个强大的替代方案的对比。补充实验也验证了其对不同稀疏度尺度图的鲁棒性。
{"title":"Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning","authors":"Zheng Gao, Hongsong Li, Zhuoren Jiang, Xiaozhong Liu","doi":"10.1145/3397271.3401055","DOIUrl":"https://doi.org/10.1145/3397271.3401055","url":null,"abstract":"Cyberspace hosts abundant interactions between users and different kinds of objects, and their relations are often encapsulated as bipartite graphs. Detecting user community in such heterogeneous graphs is an essential task to uncover user information needs and to further enhance recommendation performance. While several main cyber domains carrying high-quality graphs, unfortunately, most others can be quite sparse. However, as users may appear in multiple domains (graphs), their high-quality activities in the main domains can supply community detection in the sparse ones, e.g., user behaviors on Google can help thousands of applications to locate his/her local community when s/he uses Google ID to login those applications. In this paper, our model, Pairwise Cross-graph Community Detection (PCCD), is proposed to cope with the sparse graph problem by involving external graph knowledge to learn user pairwise community closeness instead of detecting direct communities. Particularly in our model, to avoid taking excessive propagated information, a two-level filtering module is utilized to select the most informative connections through both community and node level filters. Subsequently, a Community Recurrent Unit (CRU) is designed to estimate pairwise user community closeness. Extensive experiments on two real-world graph datasets validate our model against several strong alternatives. Supplementary experiments also validate its robustness on graphs with varied sparsity scales.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Leveraging Social Media for Medical Text Simplification 利用社交媒体简化医学文本
Nikhil Pattisapu, Nishant Prabhu, Smriti Bhati, Vasudeva Varma
Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.
患者越来越多地使用网络来了解医疗信息,做出健康决定,并验证医生的建议。然而,这些内容大多是为专业受众量身定制的,因此卫生知识不足的人往往难以获取、理解和根据这些信息采取行动。医学文本简化旨在通过计算简化医学文本来缓解这一问题。大多数文本简化方法使用神经序列到序列模型来完成这项任务。然而,训练这样的模型需要一个复杂和简单句子对齐的语料库。手动创建这样的数据集非常费力,而自动创建则容易出现对齐错误。为了克服这些挑战,我们提出了一种基于去噪自编码器的神经模型,该模型利用了医学社交媒体文本的简单写作风格。在四个数据集上的实验表明,我们的方法在多个自动化和人工评估指标上显著优于最知名的医学文本简化模型。我们的模型在SARI上比现有的最佳模型提高了16.52%,SARI是评估文本简化模型的主要指标。
{"title":"Leveraging Social Media for Medical Text Simplification","authors":"Nikhil Pattisapu, Nishant Prabhu, Smriti Bhati, Vasudeva Varma","doi":"10.1145/3397271.3401105","DOIUrl":"https://doi.org/10.1145/3397271.3401105","url":null,"abstract":"Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1