首页 > 最新文献

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
cwl_eval: An Evaluation Tool for Information Retrieval cwl_eval:一个信息检索的评估工具
L. Azzopardi, Paul Thomas, Alistair Moffat
We present a tool ("cwl_eval") which unifies many metrics typically used to evaluate information retrieval systems using test collections. In the CWL framework metrics are specified via a single function which can be used to derive a number of related measurements: Expected Utility per item, Expected Total Utility, Expected Cost per item, Expected Total Cost, and Expected Depth. The CWL framework brings together several independent approaches for measuring the quality of a ranked list, and provides a coherent user model-based framework for developing measures based on utility (gain) and cost. Here we outline the CWL measurement framework; describe the cwl_eval architecture; and provide examples of how to use it. We provide implementations of a number of recent metrics, including Time Biased Gain, U-Measure, Bejewelled Measure, and the Information Foraging Based Measure, as well as previous metrics such as Precision, Average Precision, Discounted Cumulative Gain, Rank-Biased Precision, and INST. By providing state-of-the-art and traditional metrics within the same framework, we promote a standardised approach to evaluating search effectiveness.
我们提出了一个工具(“cwl_eval”),它统一了许多通常用于使用测试集合评估信息检索系统的指标。在CWL框架中,度量是通过单个函数指定的,该函数可用于派生出许多相关度量:每个项目的预期效用、预期总效用、每个项目的预期成本、预期总成本和预期深度。CWL框架汇集了几种独立的方法来衡量排名列表的质量,并提供了一个基于用户模型的一致框架,用于开发基于效用(收益)和成本的度量。在这里,我们概述了CWL的测量框架;描述cwl_eval架构;并提供如何使用它的例子。我们提供了许多最新指标的实现,包括时间偏差增益、u型测量、宝石迷阵测量和基于信息采集的测量,以及以前的指标,如精度、平均精度、折扣累积增益、秩偏差精度和INST。通过在同一框架内提供最先进和传统的指标,我们促进了一种评估搜索有效性的标准化方法。
{"title":"cwl_eval: An Evaluation Tool for Information Retrieval","authors":"L. Azzopardi, Paul Thomas, Alistair Moffat","doi":"10.1145/3331184.3331398","DOIUrl":"https://doi.org/10.1145/3331184.3331398","url":null,"abstract":"We present a tool (\"cwl_eval\") which unifies many metrics typically used to evaluate information retrieval systems using test collections. In the CWL framework metrics are specified via a single function which can be used to derive a number of related measurements: Expected Utility per item, Expected Total Utility, Expected Cost per item, Expected Total Cost, and Expected Depth. The CWL framework brings together several independent approaches for measuring the quality of a ranked list, and provides a coherent user model-based framework for developing measures based on utility (gain) and cost. Here we outline the CWL measurement framework; describe the cwl_eval architecture; and provide examples of how to use it. We provide implementations of a number of recent metrics, including Time Biased Gain, U-Measure, Bejewelled Measure, and the Information Foraging Based Measure, as well as previous metrics such as Precision, Average Precision, Discounted Cumulative Gain, Rank-Biased Precision, and INST. By providing state-of-the-art and traditional metrics within the same framework, we promote a standardised approach to evaluating search effectiveness.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"10 9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88053218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
One-Class Order Embedding for Dependency Relation Prediction 依赖关系预测的单类顺序嵌入
Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Xavier Jayaraj Siddarth Ashok, Philips Kokoh Prasetyo
Learning the dependency relations among entities and the hierarchy formed by these relations by mapping entities into some order embedding space can effectively enable several important applications, including knowledge base completion and prerequisite relations prediction. Nevertheless, it is very challenging to learn a good order embedding due to the existence of partial ordering and missing relations in the observed data. Moreover, most application scenarios do not provide non-trivial negative dependency relation instances. We therefore propose a framework that performs dependency relation prediction by exploring both rich semantic and hierarchical structure information in the data. In particular, we propose several negative sampling strategies based on graph-specific centrality properties, which supplement the positive dependency relations with appropriate negative samples to effectively learn order embeddings. This research not only addresses the needs of automatically recovering missing dependency relations, but also unravels dependencies among entities using several real-world datasets, such as course dependency hierarchy involving course prerequisite relations, job hierarchy in organizations, and paper citation hierarchy. Extensive experiments are conducted on both synthetic and real-world datasets to demonstrate the prediction accuracy as well as to gain insights using the learned order embedding.
通过将实体映射到某个顺序嵌入空间,学习实体之间的依赖关系以及这些关系所形成的层次结构,可以有效地实现知识库补全和前提关系预测等重要应用。然而,由于观测数据中存在偏序和缺失关系,学习一个好的序嵌入是非常具有挑战性的。此外,大多数应用程序场景不提供重要的负依赖关系实例。因此,我们提出了一个框架,通过探索数据中丰富的语义和层次结构信息来执行依赖关系预测。特别是,我们提出了几种基于图特定中心性的负采样策略,这些策略用适当的负样本补充了正依赖关系,以有效地学习阶嵌入。本研究不仅解决了自动恢复缺失依赖关系的需求,还利用多个真实数据集揭示了实体之间的依赖关系,如涉及课程先决条件关系的课程依赖层次、组织中的工作层次和论文引用层次。在合成和现实世界的数据集上进行了大量的实验,以证明预测的准确性以及使用学习的顺序嵌入获得的见解。
{"title":"One-Class Order Embedding for Dependency Relation Prediction","authors":"Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Xavier Jayaraj Siddarth Ashok, Philips Kokoh Prasetyo","doi":"10.1145/3331184.3331249","DOIUrl":"https://doi.org/10.1145/3331184.3331249","url":null,"abstract":"Learning the dependency relations among entities and the hierarchy formed by these relations by mapping entities into some order embedding space can effectively enable several important applications, including knowledge base completion and prerequisite relations prediction. Nevertheless, it is very challenging to learn a good order embedding due to the existence of partial ordering and missing relations in the observed data. Moreover, most application scenarios do not provide non-trivial negative dependency relation instances. We therefore propose a framework that performs dependency relation prediction by exploring both rich semantic and hierarchical structure information in the data. In particular, we propose several negative sampling strategies based on graph-specific centrality properties, which supplement the positive dependency relations with appropriate negative samples to effectively learn order embeddings. This research not only addresses the needs of automatically recovering missing dependency relations, but also unravels dependencies among entities using several real-world datasets, such as course dependency hierarchy involving course prerequisite relations, job hierarchy in organizations, and paper citation hierarchy. Extensive experiments are conducted on both synthetic and real-world datasets to demonstrate the prediction accuracy as well as to gain insights using the learned order embedding.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86802197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Training Streaming Factorization Machines with Alternating Least Squares 交替最小二乘训练流分解机
Xueyu Mao, Saayan Mitra, Sheng Li
Factorization Machines (FM) have been widely applied in industrial applications for recommendations. Traditionally FM models are trained in batch mode, which entails training the model with large datasets every few hours or days. Such training procedure cannot capture the trends evolving in real time with large volume of streaming data. In this paper, we propose an online training scheme for FM with the alternating least squares (ALS) technique, which has comparable performance with existing batch training algorithms. We incorporate an online update mechanism to the model parameters at the cost of storing a small cache. The mechanism also stabilizes the training error more than a traditional online training technique like stochastic gradient descent (SGD) as data points come in, which is crucial for real-time applications. Experiments on large scale datasets validate the efficiency and robustness of our method.
因式分解机(FM)在工业应用中得到了广泛的应用。传统的FM模型是以批处理模式训练的,这需要每隔几个小时或几天用大型数据集训练模型。这样的训练过程在大量流数据的情况下,无法实时捕捉变化的趋势。本文提出了一种基于交替最小二乘(ALS)技术的FM在线训练方案,该方案与现有的批处理训练算法性能相当。我们将在线更新机制整合到模型参数中,代价是存储一个小缓存。随着数据点的输入,该机制比传统的在线训练技术(如随机梯度下降(SGD))更能稳定训练误差,这对实时应用至关重要。大规模数据集实验验证了该方法的有效性和鲁棒性。
{"title":"Training Streaming Factorization Machines with Alternating Least Squares","authors":"Xueyu Mao, Saayan Mitra, Sheng Li","doi":"10.1145/3331184.3331374","DOIUrl":"https://doi.org/10.1145/3331184.3331374","url":null,"abstract":"Factorization Machines (FM) have been widely applied in industrial applications for recommendations. Traditionally FM models are trained in batch mode, which entails training the model with large datasets every few hours or days. Such training procedure cannot capture the trends evolving in real time with large volume of streaming data. In this paper, we propose an online training scheme for FM with the alternating least squares (ALS) technique, which has comparable performance with existing batch training algorithms. We incorporate an online update mechanism to the model parameters at the cost of storing a small cache. The mechanism also stabilizes the training error more than a traditional online training technique like stochastic gradient descent (SGD) as data points come in, which is crucial for real-time applications. Experiments on large scale datasets validate the efficiency and robustness of our method.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78622288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Camouflaged Needle in a Haystack?: Pornographic Products Detection via Berrypicking Tree Model 在干草堆里找到伪装的针?:基于berrypkingtree模型的色情产品检测
Guoxiu He, Yangyang Kang, Zhe Gao, Zhuoren Jiang, Changlong Sun, Xiaozhong Liu, Wei Lu, Qiong Zhang, Luo Si
It is an important and urgent research problem for decentralized eCommerce services, e.g., eBay, eBid, and Taobao, to detect illegal products, e.g., unclassified pornographic products. However, it is a challenging task as some sellers may utilize and change camouflaged text to deceive the current detection algorithms. In this study, we propose a novel task to dynamically locate the pornographic products from very large product collections. Unlike prior product classification efforts focusing on textual information, the proposed model, BerryPIcking TRee MoDel (BIRD), utilizes both product textual content and buyers' seeking behavior information as berrypicking trees. In particular, the BIRD encodes both semantic information with respect to all branches sequence and the overall latent buyer intent during the whole seeking process. An extensive set of experiments have been conducted to demonstrate the advantage of the proposed model against alternative solutions. To facilitate further research of this practical and important problem, the codes and buyers' seeking behavior data have been made publicly available1.
对于eBay、eBid、淘宝等分散的电子商务服务平台来说,如何检测非法产品(如未分类的色情产品)是一个重要而迫切的研究问题。然而,这是一项具有挑战性的任务,因为一些卖家可能会利用和改变伪装文本来欺骗当前的检测算法。在这项研究中,我们提出了一个新的任务,从非常大的产品集合中动态定位色情产品。与以往的产品分类工作侧重于文本信息不同,本文提出的berryping树模型(BIRD)将产品文本内容和购买者的寻找行为信息作为berryping树。特别是,BIRD在整个寻找过程中对所有分支序列的语义信息和整体潜在买家意图进行编码。已经进行了一系列广泛的实验,以证明所提出的模型相对于替代解决方案的优势。为了便于对这一现实而重要的问题进行进一步的研究,这些代码和买家的寻找行为数据已经公开。
{"title":"Finding Camouflaged Needle in a Haystack?: Pornographic Products Detection via Berrypicking Tree Model","authors":"Guoxiu He, Yangyang Kang, Zhe Gao, Zhuoren Jiang, Changlong Sun, Xiaozhong Liu, Wei Lu, Qiong Zhang, Luo Si","doi":"10.1145/3331184.3331197","DOIUrl":"https://doi.org/10.1145/3331184.3331197","url":null,"abstract":"It is an important and urgent research problem for decentralized eCommerce services, e.g., eBay, eBid, and Taobao, to detect illegal products, e.g., unclassified pornographic products. However, it is a challenging task as some sellers may utilize and change camouflaged text to deceive the current detection algorithms. In this study, we propose a novel task to dynamically locate the pornographic products from very large product collections. Unlike prior product classification efforts focusing on textual information, the proposed model, BerryPIcking TRee MoDel (BIRD), utilizes both product textual content and buyers' seeking behavior information as berrypicking trees. In particular, the BIRD encodes both semantic information with respect to all branches sequence and the overall latent buyer intent during the whole seeking process. An extensive set of experiments have been conducted to demonstrate the advantage of the proposed model against alternative solutions. To facilitate further research of this practical and important problem, the codes and buyers' seeking behavior data have been made publicly available1.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79906582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Investigating the Interplay Between Searchers' Privacy Concerns and Their Search Behavior 调查搜索者的隐私问题和他们的搜索行为之间的相互作用
Steven Zimmerman, Alistair Thorpe, C. Fox, Udo Kruschwitz
Privacy concerns are becoming a dominant focus in search applications, thus there is a growing need to understand implications of efforts to address these concerns. Our research investigates a search system with privacy warning labels, an approach inspired by decision making research on food nutrition labels. This approach is designed to alert users to potential privacy threats in their search for information as one possible avenue to address privacy concerns. Our primary goal is to understand the extent to which attitudes towards privacy are linked to behaviors that protect privacy. In the present study, participants were given a set of fact-based decision tasks from the domain of health search. Participants were rotated through variations of search engine results pages (SERPs) including a SERP with a privacy warning light system. Lastly, participants completed a survey to capture attitudes towards privacy, behaviors to protect privacy, and other demographic information. In addition to the comparison of interactive search behaviors of a privacy warning SERP with a control SERP, we compared self-report privacy measures with interactive search behaviors. Participants reported strong concerns around privacy of health information while simultaneously placing high importance on the correctness of this information. Analysis of our interactive experiment and self-report privacy measures indicate that 1) choice of privacy-protective browsers has a significant link to privacy attitudes and privacy-protective behaviors in a SERP and 2) there are no significant links between reported concerns towards privacy and recorded behavior in an information retrieval system with warnings that enable users to protect their privacy.
隐私问题正在成为搜索应用程序的主要关注点,因此越来越需要了解解决这些问题的努力的含义。我们的研究研究了一个带有隐私警告标签的搜索系统,这是一种受食品营养标签决策研究启发的方法。这种方法旨在提醒用户在搜索信息时注意潜在的隐私威胁,作为解决隐私问题的一种可能途径。我们的主要目标是了解对隐私的态度与保护隐私的行为之间的联系程度。在本研究中,参与者被给予一组基于事实的决策任务,这些任务来自健康搜索领域。参与者轮流浏览各种搜索引擎结果页面(SERP),包括带有隐私警示灯系统的SERP。最后,参与者完成了一项调查,以获取对隐私的态度、保护隐私的行为和其他人口统计信息。除了比较隐私警告SERP与对照SERP的交互搜索行为外,我们还比较了自我报告隐私措施与交互搜索行为。与会者报告了对健康信息隐私的强烈关切,同时高度重视这些信息的正确性。我们的互动实验和自我报告隐私措施的分析表明,1)选择隐私保护浏览器与SERP中的隐私态度和隐私保护行为有显著联系;2)报告的隐私关注与信息检索系统中记录的行为之间没有显著联系,这些系统带有警告,使用户能够保护他们的隐私。
{"title":"Investigating the Interplay Between Searchers' Privacy Concerns and Their Search Behavior","authors":"Steven Zimmerman, Alistair Thorpe, C. Fox, Udo Kruschwitz","doi":"10.1145/3331184.3331280","DOIUrl":"https://doi.org/10.1145/3331184.3331280","url":null,"abstract":"Privacy concerns are becoming a dominant focus in search applications, thus there is a growing need to understand implications of efforts to address these concerns. Our research investigates a search system with privacy warning labels, an approach inspired by decision making research on food nutrition labels. This approach is designed to alert users to potential privacy threats in their search for information as one possible avenue to address privacy concerns. Our primary goal is to understand the extent to which attitudes towards privacy are linked to behaviors that protect privacy. In the present study, participants were given a set of fact-based decision tasks from the domain of health search. Participants were rotated through variations of search engine results pages (SERPs) including a SERP with a privacy warning light system. Lastly, participants completed a survey to capture attitudes towards privacy, behaviors to protect privacy, and other demographic information. In addition to the comparison of interactive search behaviors of a privacy warning SERP with a control SERP, we compared self-report privacy measures with interactive search behaviors. Participants reported strong concerns around privacy of health information while simultaneously placing high importance on the correctness of this information. Analysis of our interactive experiment and self-report privacy measures indicate that 1) choice of privacy-protective browsers has a significant link to privacy attitudes and privacy-protective behaviors in a SERP and 2) there are no significant links between reported concerns towards privacy and recorded behavior in an information retrieval system with warnings that enable users to protect their privacy.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83852980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Context-based Framework for Resource Citation Classification in Scientific Literatures 基于上下文的科学文献资源引文分类框架
He Zhao, Zhunchen Luo, Chong Feng, Yuming Ye
In this paper, we introduce the task of resource citation classification for scientific literature using a context-based framework. This task is to analyze the purpose of citing an on-line resource in scientific text by modeling the role and function of each resource citation. It can be incorporated into resource indexing and recommendation systems to help better understand and classify on-line resources in scientific literature. We propose a new annotation scheme for this task and develop a dataset of 3,088 manually annotated resource citations. We adopt a neural-based model to build the classifiers and apply them on the large ARC dataset to examine the revolution of scientific resources from trends in their function over time.
本文介绍了基于上下文框架的科学文献资源引文分类任务。本课题通过对各资源被引的角色和功能建模,分析科学文本中在线资源被引的目的。它可以整合到资源索引和推荐系统中,以帮助更好地理解和分类科学文献中的在线资源。为此,我们提出了一种新的标注方案,并开发了一个包含3088条人工标注资源引文的数据集。我们采用基于神经网络的模型来构建分类器,并将其应用于大型ARC数据集,从其功能随时间的趋势来检查科学资源的革命。
{"title":"A Context-based Framework for Resource Citation Classification in Scientific Literatures","authors":"He Zhao, Zhunchen Luo, Chong Feng, Yuming Ye","doi":"10.1145/3331184.3331348","DOIUrl":"https://doi.org/10.1145/3331184.3331348","url":null,"abstract":"In this paper, we introduce the task of resource citation classification for scientific literature using a context-based framework. This task is to analyze the purpose of citing an on-line resource in scientific text by modeling the role and function of each resource citation. It can be incorporated into resource indexing and recommendation systems to help better understand and classify on-line resources in scientific literature. We propose a new annotation scheme for this task and develop a dataset of 3,088 manually annotated resource citations. We adopt a neural-based model to build the classifiers and apply them on the large ARC dataset to examine the revolution of scientific resources from trends in their function over time.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"142 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91422320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Document Distance Metric Learning in an Interactive Exploration Process 交互式探索过程中的文档距离度量学习
Marco Wrzalik
Visualization of inter-document similarities is widely used for the exploration of document collections and interactive retrieval. However, similarity relationships between documents are multifaceted and measured distances by a given metric often do not match the perceived similarity of human beings. Furthermore, the user's notion of similarity can drastically change with the exploration objective or task at hand. Therefore, this research proposes to investigate online adjustments to the similarity model using feedback generated during exploration or exploratory search. In this course, rich visualizations and interactions will support users to give valuable feedback. Based on this, metric learning methodologies will be applied to adjust a similarity model in order to improve the exploration experience. At the same time, trained models are considered as valuable outcomes whose benefits for similarity-based tasks such as query-by-example retrieval or classification will be tested.
文档间相似度的可视化被广泛应用于文档集合的探索和交互检索。然而,文档之间的相似关系是多方面的,通过给定度量测量的距离通常与人类感知的相似度不匹配。此外,用户对相似性的概念可能会随着手边的探索目标或任务而急剧变化。因此,本研究提出利用探索或探索性搜索过程中产生的反馈对相似度模型进行在线调整。在本课程中,丰富的可视化和交互将支持用户提供有价值的反馈。在此基础上,采用度量学习方法调整相似度模型,以提高勘探体验。同时,经过训练的模型被认为是有价值的结果,其对基于相似性的任务(如按例查询检索或分类)的好处将得到测试。
{"title":"Document Distance Metric Learning in an Interactive Exploration Process","authors":"Marco Wrzalik","doi":"10.1145/3331184.3331420","DOIUrl":"https://doi.org/10.1145/3331184.3331420","url":null,"abstract":"Visualization of inter-document similarities is widely used for the exploration of document collections and interactive retrieval. However, similarity relationships between documents are multifaceted and measured distances by a given metric often do not match the perceived similarity of human beings. Furthermore, the user's notion of similarity can drastically change with the exploration objective or task at hand. Therefore, this research proposes to investigate online adjustments to the similarity model using feedback generated during exploration or exploratory search. In this course, rich visualizations and interactions will support users to give valuable feedback. Based on this, metric learning methodologies will be applied to adjust a similarity model in order to improve the exploration experience. At the same time, trained models are considered as valuable outcomes whose benefits for similarity-based tasks such as query-by-example retrieval or classification will be tested.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87001484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Anonymous Commenting: A Greedy Approach to Balance Utilization and Anonymity for Instagram Users 匿名评论:Instagram用户平衡利用率和匿名性的贪婪方法
Arian Askari, Asal Jalilvand, Mahmood Neshati
In many online services, anonymous commenting is not possible for the users; therefore, the users can not express their critical opinions without disregarding the consequences. As for now, naïve approaches are available for anonymous commenting which cause problems for analytical services on user comments. In this paper, we explore anonymous commenting approaches and their pros and cons. We also propose methods for anonymous commenting where it's possible to protect the user privacy while allowing sentimental analytics for service providers. Our experiments were conducted on a real dataset gathered from Instagram comments which indicate the effectiveness of our proposed methods in privacy protection and sentimental analytics. The proposed methods are independent of a particular website and can be utilized in various domains.
在许多在线服务中,用户不可能匿名评论;因此,用户不可能在不顾后果的情况下表达他们的批评意见。目前,匿名评论的方法有naïve,这会给用户评论的分析服务带来问题。在本文中,我们探讨了匿名评论方法及其优缺点。我们还提出了匿名评论的方法,在允许服务提供商进行情感分析的同时,可以保护用户隐私。我们的实验是在从Instagram评论中收集的真实数据集上进行的,这表明我们提出的方法在隐私保护和情感分析方面是有效的。所提出的方法独立于特定的网站,可用于各种领域。
{"title":"On Anonymous Commenting: A Greedy Approach to Balance Utilization and Anonymity for Instagram Users","authors":"Arian Askari, Asal Jalilvand, Mahmood Neshati","doi":"10.1145/3331184.3331364","DOIUrl":"https://doi.org/10.1145/3331184.3331364","url":null,"abstract":"In many online services, anonymous commenting is not possible for the users; therefore, the users can not express their critical opinions without disregarding the consequences. As for now, naïve approaches are available for anonymous commenting which cause problems for analytical services on user comments. In this paper, we explore anonymous commenting approaches and their pros and cons. We also propose methods for anonymous commenting where it's possible to protect the user privacy while allowing sentimental analytics for service providers. Our experiments were conducted on a real dataset gathered from Instagram comments which indicate the effectiveness of our proposed methods in privacy protection and sentimental analytics. The proposed methods are independent of a particular website and can be utilized in various domains.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86014846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation 教机器如何阅读:阅读行为启发相关性估计
Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma
Retrieval models aim to estimate the relevance of a document to a certain query. Although existing retrieval models have gained much success in both deepening our understanding of information seeking behavior and constructing practical retrieval systems (e.g. Web search engines), we have to admit that the models work in a rather different manner than how humans make relevance judgments. In this paper, we aim to reexamine the existing models as well as to propose new ones based on the findings in how human read documents during relevance judgment. First, we summarize a number of reading heuristics from practical user behavior patterns, which are categorized into implicit and explicit heuristics. By reviewing a variety of existing retrieval models, we find that most of them only satisfy a part of these reading heuristics. To evaluate the effectiveness of each heuristic, we conduct an ablation study and find that most heuristics have positive impacts on retrieval performance. We further integrate all the effective heuristics into a new retrieval model named Reading Inspired Model (RIM). Specifically, implicit reading heuristics are incorporated into the model framework and explicit reading heuristics are modeled as a Markov Decision Process and learned by reinforcement learning. Experimental results on a large-scale public available benchmark dataset and two test sets from NTCIR WWW tasks show that RIM outperforms most existing models, which illustrates the effectiveness of the reading heuristics. We believe that this work contributes to constructing retrieval models with both higher retrieval performance and better explainability.
检索模型的目的是估计文档与某个查询的相关性。尽管现有的检索模型在加深我们对信息寻找行为的理解和构建实用的检索系统(例如Web搜索引擎)方面取得了很大的成功,但我们不得不承认,这些模型的工作方式与人类做出相关性判断的方式相当不同。在本文中,我们的目的是重新审视现有的模型,并提出新的基于人类如何阅读文件在相关性判断的研究结果。首先,我们从实际用户行为模式中总结了一些阅读启发式,它们分为内隐启发式和外显启发式。通过对现有的各种检索模型的回顾,我们发现大多数检索模型只能满足这些阅读启发式的一部分。为了评估每个启发式的有效性,我们进行了一个消融研究,发现大多数启发式对检索性能有积极的影响。我们进一步将所有有效的启发式方法整合到一个新的检索模型中,称为阅读启发模型(RIM)。具体而言,内隐阅读启发式被纳入模型框架,外显阅读启发式被建模为马尔可夫决策过程,并通过强化学习进行学习。在大规模公共基准数据集和NTCIR WWW任务的两个测试集上的实验结果表明,RIM优于大多数现有模型,这说明了阅读启发式算法的有效性。我们相信这项工作有助于构建具有更高检索性能和更好可解释性的检索模型。
{"title":"Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation","authors":"Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma","doi":"10.1145/3331184.3331205","DOIUrl":"https://doi.org/10.1145/3331184.3331205","url":null,"abstract":"Retrieval models aim to estimate the relevance of a document to a certain query. Although existing retrieval models have gained much success in both deepening our understanding of information seeking behavior and constructing practical retrieval systems (e.g. Web search engines), we have to admit that the models work in a rather different manner than how humans make relevance judgments. In this paper, we aim to reexamine the existing models as well as to propose new ones based on the findings in how human read documents during relevance judgment. First, we summarize a number of reading heuristics from practical user behavior patterns, which are categorized into implicit and explicit heuristics. By reviewing a variety of existing retrieval models, we find that most of them only satisfy a part of these reading heuristics. To evaluate the effectiveness of each heuristic, we conduct an ablation study and find that most heuristics have positive impacts on retrieval performance. We further integrate all the effective heuristics into a new retrieval model named Reading Inspired Model (RIM). Specifically, implicit reading heuristics are incorporated into the model framework and explicit reading heuristics are modeled as a Markov Decision Process and learned by reinforcement learning. Experimental results on a large-scale public available benchmark dataset and two test sets from NTCIR WWW tasks show that RIM outperforms most existing models, which illustrates the effectiveness of the reading heuristics. We believe that this work contributes to constructing retrieval models with both higher retrieval performance and better explainability.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88639624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Multi-Level Matching Networks for Text Matching 用于文本匹配的多级匹配网络
Chunlin Xu, Zhiwei Lin, Shengli Wu, Hui Wang
Text matching aims to establish the matching relationship between two texts. It is an important operation in some information retrieval related tasks such as question duplicate detection, question answering, and dialog systems. Bidirectional long short term memory (BiLSTM) coupled with attention mechanism has achieved state-of-the-art performance in text matching. A major limitation of existing works is that only high level contextualized word representations are utilized to obtain word level matching results without considering other levels of word representations, thus resulting in incorrect matching decisions for cases where two words with different meanings are very close in high level contextualized word representation space. Therefore, instead of making decisions utilizing single level word representations, a multi-level matching network (MMN) is proposed in this paper for text matching, which utilizes multiple levels of word representations to obtain multiple word level matching results for final text level matching decision. Experimental results on two widely used benchmarks, SNLI and Scaitail, show that the proposed MMN achieves the state-of-the-art performance.
文本匹配旨在建立两个文本之间的匹配关系。在一些信息检索相关的任务中,如问题重复检测、问题回答和对话系统中,它是一个重要的操作。双向长短期记忆(BiLSTM)与注意机制相结合,在文本匹配方面取得了较好的效果。现有工作的一个主要局限是只利用高水平语境化词表示来获得词级匹配结果,而没有考虑其他水平的词表示,从而导致在高水平语境化词表示空间中两个不同含义的词非常接近的情况下,会产生不正确的匹配决策。因此,本文提出了一种用于文本匹配的多层匹配网络(MMN),而不是利用单层词表示进行决策。多层匹配网络利用多层词表示获得多个词级匹配结果,从而进行最终的文本级匹配决策。在SNLI和scitail两个广泛使用的基准测试上的实验结果表明,所提出的MMN达到了最先进的性能。
{"title":"Multi-Level Matching Networks for Text Matching","authors":"Chunlin Xu, Zhiwei Lin, Shengli Wu, Hui Wang","doi":"10.1145/3331184.3331276","DOIUrl":"https://doi.org/10.1145/3331184.3331276","url":null,"abstract":"Text matching aims to establish the matching relationship between two texts. It is an important operation in some information retrieval related tasks such as question duplicate detection, question answering, and dialog systems. Bidirectional long short term memory (BiLSTM) coupled with attention mechanism has achieved state-of-the-art performance in text matching. A major limitation of existing works is that only high level contextualized word representations are utilized to obtain word level matching results without considering other levels of word representations, thus resulting in incorrect matching decisions for cases where two words with different meanings are very close in high level contextualized word representation space. Therefore, instead of making decisions utilizing single level word representations, a multi-level matching network (MMN) is proposed in this paper for text matching, which utilizes multiple levels of word representations to obtain multiple word level matching results for final text level matching decision. Experimental results on two widely used benchmarks, SNLI and Scaitail, show that the proposed MMN achieves the state-of-the-art performance.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86432677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1