首页 > 最新文献

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining最新文献

英文 中文
Improving Session Search by Modeling Multi-Granularity Historical Query Change 通过建模多粒度历史查询变化改进会话搜索
Xiaochen Zuo, Zhicheng Dou, Ji-rong Wen
In session search, it's important to utilize historical interactions between users and the search engines to improve document retrieval. However, not all historical information contributes to document ranking. Users often express their preferences in the process of modifying the previous query, which can help us catch useful information in the historical interactions. Inspired by it, we propose to model historical query change to improve document ranking performance. Especially, we characterize multi-granularity query change between each pair of adjacent queries at both term level and semantic level. For term level query change, we calculate three types of term weights, including the retained term weights, added term weights and removed term weights. Then we perform term-based interaction between the candidate document and historical queries based on the term weights. For semantic level query change, we calculate an overall representation of user intent by integrating the representations of each historical query obtained by different types of term weights. Then we adopt representation-based matching between this representation and the candidate document. To improve the effect of query change modeling, we introduce query change classification as an auxiliary task. Experimental results on AOL and TianGong-ST search logs show that our model outperforms most existing models for session search.
在会话搜索中,利用用户和搜索引擎之间的历史交互来改进文档检索是很重要的。然而,并非所有历史信息都有助于文档排名。用户经常在修改前一个查询的过程中表达他们的偏好,这可以帮助我们在历史交互中捕获有用的信息。受其启发,我们提出对历史查询变化建模以提高文档排名性能。特别地,我们在术语级和语义级描述了每对相邻查询之间的多粒度查询变化。对于术语级查询更改,我们计算三种类型的术语权重,包括保留的术语权重、添加的术语权重和删除的术语权重。然后,我们在候选文档和基于词权重的历史查询之间执行基于词的交互。对于语义级查询变化,我们通过整合由不同类型的术语权重获得的每个历史查询的表示来计算用户意图的总体表示。然后采用基于表示的方法对该表示与候选文档进行匹配。为了提高查询更改建模的效果,我们引入了查询更改分类作为辅助任务。在AOL和天宫- st搜索日志上的实验结果表明,我们的模型优于大多数现有的会话搜索模型。
{"title":"Improving Session Search by Modeling Multi-Granularity Historical Query Change","authors":"Xiaochen Zuo, Zhicheng Dou, Ji-rong Wen","doi":"10.1145/3488560.3498415","DOIUrl":"https://doi.org/10.1145/3488560.3498415","url":null,"abstract":"In session search, it's important to utilize historical interactions between users and the search engines to improve document retrieval. However, not all historical information contributes to document ranking. Users often express their preferences in the process of modifying the previous query, which can help us catch useful information in the historical interactions. Inspired by it, we propose to model historical query change to improve document ranking performance. Especially, we characterize multi-granularity query change between each pair of adjacent queries at both term level and semantic level. For term level query change, we calculate three types of term weights, including the retained term weights, added term weights and removed term weights. Then we perform term-based interaction between the candidate document and historical queries based on the term weights. For semantic level query change, we calculate an overall representation of user intent by integrating the representations of each historical query obtained by different types of term weights. Then we adopt representation-based matching between this representation and the candidate document. To improve the effect of query change modeling, we introduce query change classification as an auxiliary task. Experimental results on AOL and TianGong-ST search logs show that our model outperforms most existing models for session search.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114986432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploration in Recommender Systems 推荐系统的探索
Minmin Chen
In the era of increasing choices, recommender systems are becoming indispensable in helping users navigate the million or billion pieces of content on recommendation platforms. Most of the recommender systems are powered by ML models trained on a large amount of user-item interaction data. Such a setup however induces a strong feedback loop that creates the rich gets richer phenomenon where head contents are getting more and more exposure while tail and fresh contents are not discovered. At the same time, it pigeonholes users to contents they are already familiar with. We believe exploration is key to break away from the feedback loop and to optimize long term user experience on recommendation platforms. The exploration-exploitation tradeoff, being the foundation of bandits and RL research, has been extensively studied in RL. While effective exploration is believed to positively influence the user experience on the platform, the exact value of exploration in recommender systems has not been well established. In this talk, we examine the roles of exploration in recommender systems in three facets: 1) system exploration to surface fresh/tail recommendations based on users' known interests; 2) user exploration to identify unknown user interests or introduce users to new interests; and 3) online exploration to utilize real-time user feedback to reduce extrapolation errors in performing system and user exploration. We discuss the challenges in measurements and optimization in different types of exploration, and propose initial solutions. We showcase how each aspect of exploration contributes to the long term user experience through offline and live experiments on industrial recommendation platforms. We hope this talk can inspire more follow up work in understanding and improving exploration in recommender systems.
在选择越来越多的时代,推荐系统在帮助用户浏览推荐平台上的数百万或数十亿条内容方面变得不可或缺。大多数推荐系统都是由经过大量用户-项目交互数据训练的ML模型提供支持的。然而,这样的设置诱导了一个强大的反馈循环,创造了“富得越来越富”的现象,即头部内容得到越来越多的曝光,而尾部和新鲜内容却没有被发现。与此同时,它将用户归类到他们已经熟悉的内容中。我们认为,探索是打破反馈循环、优化推荐平台长期用户体验的关键。勘探与开发的权衡是强盗和RL研究的基础,在RL中得到了广泛的研究。虽然有效的探索被认为对平台上的用户体验有积极的影响,但探索在推荐系统中的确切价值尚未得到很好的确立。在这次演讲中,我们从三个方面考察了探索在推荐系统中的作用:1)系统探索,根据用户已知的兴趣提供新鲜/尾推荐;2)用户探索,识别未知的用户兴趣或向用户介绍新的兴趣;3)在线探索利用实时用户反馈来减少执行系统和用户探索时的外推误差。我们讨论了在不同类型的勘探中测量和优化的挑战,并提出了初步的解决方案。我们通过行业推荐平台上的线下和现场实验,展示了探索的各个方面如何为长期用户体验做出贡献。我们希望这次演讲能够激发更多的后续工作来理解和改进推荐系统的探索。
{"title":"Exploration in Recommender Systems","authors":"Minmin Chen","doi":"10.1145/3488560.3510009","DOIUrl":"https://doi.org/10.1145/3488560.3510009","url":null,"abstract":"In the era of increasing choices, recommender systems are becoming indispensable in helping users navigate the million or billion pieces of content on recommendation platforms. Most of the recommender systems are powered by ML models trained on a large amount of user-item interaction data. Such a setup however induces a strong feedback loop that creates the rich gets richer phenomenon where head contents are getting more and more exposure while tail and fresh contents are not discovered. At the same time, it pigeonholes users to contents they are already familiar with. We believe exploration is key to break away from the feedback loop and to optimize long term user experience on recommendation platforms. The exploration-exploitation tradeoff, being the foundation of bandits and RL research, has been extensively studied in RL. While effective exploration is believed to positively influence the user experience on the platform, the exact value of exploration in recommender systems has not been well established. In this talk, we examine the roles of exploration in recommender systems in three facets: 1) system exploration to surface fresh/tail recommendations based on users' known interests; 2) user exploration to identify unknown user interests or introduce users to new interests; and 3) online exploration to utilize real-time user feedback to reduce extrapolation errors in performing system and user exploration. We discuss the challenges in measurements and optimization in different types of exploration, and propose initial solutions. We showcase how each aspect of exploration contributes to the long term user experience through offline and live experiments on industrial recommendation platforms. We hope this talk can inspire more follow up work in understanding and improving exploration in recommender systems.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quality Assurance of a German COVID-19 Question Answering Systems using Component-based Microbenchmarking 基于组件的微基准测试的德国COVID-19问答系统质量保证
A. Both, Paul Heinze, A. Perevalov, Johannes Richard Bartsch, Rostislav Iudin, Johannes Rudolf Herkner, Tim Schrader, Jonas Wunsch, René Gürth, Ann Kristin Falkenhain
Question Answering (QA) has become an often used method to retrieve data as part of chatbots and other natural-language user interfaces. In particular, QA systems of official institutions have high expectations regarding the answers computed by the system, as the provided information might be critical. In this demonstration, we use the official COVID-19 QA system that was developed together with the German Federal government to provide German citizens access to data regarding incident values, number of deaths, etc. To ensure high quality, a component-based approach was used that enables exchanging data between QA components using RDF and validating the functionality of the QA system using SPARQL. Here, we will demonstrate how our solution enables developers of QA systems to use a descriptive approach to validate the quality of their implementation before the system's deployment and also within a live environment.
作为聊天机器人和其他自然语言用户界面的一部分,问答(QA)已经成为一种常用的检索数据的方法。特别是,官方机构的QA系统对系统计算的答案有很高的期望,因为提供的信息可能是关键的。在本次演示中,我们使用了与德国联邦政府共同开发的官方COVID-19 QA系统,为德国公民提供有关事件值、死亡人数等数据。为了确保高质量,使用了一种基于组件的方法,该方法支持使用RDF在QA组件之间交换数据,并使用SPARQL验证QA系统的功能。在这里,我们将演示我们的解决方案如何使QA系统的开发人员能够在系统部署之前以及在活动环境中使用描述性方法来验证其实现的质量。
{"title":"Quality Assurance of a German COVID-19 Question Answering Systems using Component-based Microbenchmarking","authors":"A. Both, Paul Heinze, A. Perevalov, Johannes Richard Bartsch, Rostislav Iudin, Johannes Rudolf Herkner, Tim Schrader, Jonas Wunsch, René Gürth, Ann Kristin Falkenhain","doi":"10.1145/3488560.3502196","DOIUrl":"https://doi.org/10.1145/3488560.3502196","url":null,"abstract":"Question Answering (QA) has become an often used method to retrieve data as part of chatbots and other natural-language user interfaces. In particular, QA systems of official institutions have high expectations regarding the answers computed by the system, as the provided information might be critical. In this demonstration, we use the official COVID-19 QA system that was developed together with the German Federal government to provide German citizens access to data regarding incident values, number of deaths, etc. To ensure high quality, a component-based approach was used that enables exchanging data between QA components using RDF and validating the functionality of the QA system using SPARQL. Here, we will demonstrate how our solution enables developers of QA systems to use a descriptive approach to validate the quality of their implementation before the system's deployment and also within a live environment.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System 理解和设计下一代信息检索系统的现代理论工具
Da Xu, Chuanwei Ruan
In the relatively short history of machine learning, the subtle balance between engineering and theoretical progress has been proved critical at various stages. The most recent wave of AI has brought to the IR community powerful techniques, particularly for pattern recognition. While many benefits from the burst of ideas as numerous tasks become algorithmically feasible, the balance is tilting toward the application side. The existing theoretical tools in IR can no longer explain, guide, and justify the newly-established methodologies. With no choices, we have to bet our design on black-box mechanisms that we only empirically understand. The consequences can be suffering: in stark contrast to how the IR industry has envisioned modern AI making life easier, many are experiencing increased confusion and costs in data manipulation, model selection, monitoring, censoring, and decision making. This reality is not surprising: without handy theoretical tools, we often lack principled knowledge of the pattern recognition model's expressivity, optimization property, generalization guarantee, and our decision-making process has to rely on over-simplified assumptions and human judgments from time to time. Facing all the challenges, we started researching advanced theoretical tools emerging from various domains that can potentially resolve modern IR problems. We encountered many impactful ideas and made several independent publications emphasizing different pieces. Time is now to bring the community a systematic tutorial on how we successfully adapt those tools and make significant progress in understanding, designing, and eventually productionize impactful IR systems. We emphasize systematicity because IR is a comprehensive discipline that touches upon particular aspects of learning, causal inference analysis, interactive (online) decision-making, etc. It thus requires systematic calibrations to render the actual usefulness of the imported theoretical tools to serve IR problems, as they usually exhibit unique structures and definitions. Therefore, we plan this tutorial to systematically demonstrate our learning and successful experience of using advanced theoretical tools for understanding and designing IR systems.
在机器学习相对较短的历史中,工程和理论进步之间的微妙平衡在各个阶段都被证明是至关重要的。最近的人工智能浪潮给IR社区带来了强大的技术,特别是模式识别。虽然随着众多任务在算法上变得可行,想法的爆发带来了许多好处,但平衡正在向应用程序方面倾斜。IR中现有的理论工具不再能够解释、指导和证明新建立的方法。在没有选择的情况下,我们不得不把我们的设计押在我们只能凭经验理解的黑盒机制上。其后果可能是痛苦的:与IR行业设想的现代人工智能使生活更容易形成鲜明对比的是,许多人在数据操作、模型选择、监控、审查和决策方面遇到了越来越多的困惑和成本。这一现实并不令人惊讶:如果没有方便的理论工具,我们往往缺乏对模式识别模型的表达能力、优化特性、泛化保证的原则性知识,我们的决策过程不得不依赖于过度简化的假设和人类的判断。面对所有的挑战,我们开始研究来自不同领域的先进理论工具,这些工具有可能解决现代红外问题。我们遇到了许多有影响力的想法,并制作了几份独立的出版物,强调不同的作品。现在是时候给社区带来一个系统的教程,告诉他们我们如何成功地适应这些工具,并在理解、设计和最终生产有影响力的IR系统方面取得重大进展。我们强调系统性,因为IR是一门涉及学习、因果推理分析、互动(在线)决策等特定方面的综合性学科。因此,它需要系统的校准,以提供实际有用的进口理论工具,以解决IR问题,因为它们通常表现出独特的结构和定义。因此,我们计划本教程系统地展示我们使用先进的理论工具来理解和设计红外系统的学习和成功经验。
{"title":"Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System","authors":"Da Xu, Chuanwei Ruan","doi":"10.1145/3488560.3501394","DOIUrl":"https://doi.org/10.1145/3488560.3501394","url":null,"abstract":"In the relatively short history of machine learning, the subtle balance between engineering and theoretical progress has been proved critical at various stages. The most recent wave of AI has brought to the IR community powerful techniques, particularly for pattern recognition. While many benefits from the burst of ideas as numerous tasks become algorithmically feasible, the balance is tilting toward the application side. The existing theoretical tools in IR can no longer explain, guide, and justify the newly-established methodologies. With no choices, we have to bet our design on black-box mechanisms that we only empirically understand. The consequences can be suffering: in stark contrast to how the IR industry has envisioned modern AI making life easier, many are experiencing increased confusion and costs in data manipulation, model selection, monitoring, censoring, and decision making. This reality is not surprising: without handy theoretical tools, we often lack principled knowledge of the pattern recognition model's expressivity, optimization property, generalization guarantee, and our decision-making process has to rely on over-simplified assumptions and human judgments from time to time. Facing all the challenges, we started researching advanced theoretical tools emerging from various domains that can potentially resolve modern IR problems. We encountered many impactful ideas and made several independent publications emphasizing different pieces. Time is now to bring the community a systematic tutorial on how we successfully adapt those tools and make significant progress in understanding, designing, and eventually productionize impactful IR systems. We emphasize systematicity because IR is a comprehensive discipline that touches upon particular aspects of learning, causal inference analysis, interactive (online) decision-making, etc. It thus requires systematic calibrations to render the actual usefulness of the imported theoretical tools to serve IR problems, as they usually exhibit unique structures and definitions. Therefore, we plan this tutorial to systematically demonstrate our learning and successful experience of using advanced theoretical tools for understanding and designing IR systems.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122796365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Transferable Node Representations for Attribute Extraction from Web Documents 学习可转移节点表示用于Web文档的属性提取
Yichao Zhou, Ying Sheng, N. Vo, Nick Edmonds, Sandeep Tata
Given a web page, extracting an object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Prior approaches have either relied on computationally expensive visual feature engineering or required large amounts of training data to get to an acceptable precision. In this paper, we propose a novel method, LeArNing TransfErable node RepresentatioNs for Attribute Extraction (LANTERN), to tackle the problem. We model the problem as a tree node tagging task. The key insight is to learn a contextual representation for each node in the DOM tree where the context explicitly takes into account the tree structure of the neighborhood around the node. Experiments on the SWDE public dataset show that LANTERN outperforms the previous state-of-the-art (SOTA) by 1.44% (F1 score) with a dramatically simpler model architecture. Furthermore, we report that utilizing data from a different domain (for instance, using training data about web pages with cars to extract book objects) is surprisingly useful and helps beat the SOTA by a further 1.37%.
给定一个网页,提取一个对象以及各种感兴趣的属性(例如,一本书的价格、出版商、作者和类型)可以促进各种下游应用,如大规模知识库构建、电子商务产品搜索和个性化推荐。之前的方法要么依赖于计算成本高昂的视觉特征工程,要么需要大量的训练数据才能达到可接受的精度。在本文中,我们提出了一种新的方法,学习属性提取的可转移节点表示(灯笼)来解决这个问题。我们将这个问题建模为一个树节点标记任务。关键是要学习DOM树中每个节点的上下文表示,其中上下文显式地考虑了节点周围邻居的树结构。在SWDE公共数据集上的实验表明,在模型架构显著简化的情况下,LANTERN比以前的最先进的SOTA (F1分数)高出1.44%。此外,我们报告说,利用来自不同领域的数据(例如,使用关于带有汽车的网页的训练数据来提取图书对象)是非常有用的,并有助于进一步击败SOTA 1.37%。
{"title":"Learning Transferable Node Representations for Attribute Extraction from Web Documents","authors":"Yichao Zhou, Ying Sheng, N. Vo, Nick Edmonds, Sandeep Tata","doi":"10.1145/3488560.3498424","DOIUrl":"https://doi.org/10.1145/3488560.3498424","url":null,"abstract":"Given a web page, extracting an object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Prior approaches have either relied on computationally expensive visual feature engineering or required large amounts of training data to get to an acceptable precision. In this paper, we propose a novel method, LeArNing TransfErable node RepresentatioNs for Attribute Extraction (LANTERN), to tackle the problem. We model the problem as a tree node tagging task. The key insight is to learn a contextual representation for each node in the DOM tree where the context explicitly takes into account the tree structure of the neighborhood around the node. Experiments on the SWDE public dataset show that LANTERN outperforms the previous state-of-the-art (SOTA) by 1.44% (F1 score) with a dramatically simpler model architecture. Furthermore, we report that utilizing data from a different domain (for instance, using training data about web pages with cars to extract book objects) is surprisingly useful and helps beat the SOTA by a further 1.37%.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The Pit Stop Problem: How to Plan Your Next Road Trip 停车问题:如何计划你的下一次公路旅行
Kostas Kollias
Many online trip planning and navigation software need to routinely solve the problem of deciding where to take stops during a journey for various services such as refueling (or EV charging), rest stops, food, etc. The goal is to minimize the overhead of these stops while ensuring that the traveler is not starved of any essential resource (such as fuel, rest, or food) during the journey. In this paper, we formally model this problem and call it the pit stop problem. We design algorithms for this problem under various settings: single vs multiple types of stops, and offline vs online optimization (i.e., in advance of or during the trip). Our algorithms achieve provable guarantees in terms of approximating the optimal solution. We then extensively evaluate our algorithms on real world data and demonstrate that they significantly outperform baseline solutions.
许多在线旅行规划和导航软件需要解决在旅途中决定在哪里停留的问题,以提供各种服务,如加油(或电动汽车充电)、休息站、食物等。我们的目标是尽量减少这些停留的开销,同时确保旅行者在旅途中不会缺乏任何必要的资源(如燃料、休息或食物)。本文对该问题进行了形式化建模,并将其称为进站问题。我们在不同的设置下为这个问题设计了算法:单站vs多站,离线vs在线优化(即,在旅行前或旅行中)。我们的算法在逼近最优解方面实现了可证明的保证。然后,我们在真实世界的数据上广泛评估我们的算法,并证明它们明显优于基线解决方案。
{"title":"The Pit Stop Problem: How to Plan Your Next Road Trip","authors":"Kostas Kollias","doi":"10.1145/3488560.3508495","DOIUrl":"https://doi.org/10.1145/3488560.3508495","url":null,"abstract":"Many online trip planning and navigation software need to routinely solve the problem of deciding where to take stops during a journey for various services such as refueling (or EV charging), rest stops, food, etc. The goal is to minimize the overhead of these stops while ensuring that the traveler is not starved of any essential resource (such as fuel, rest, or food) during the journey. In this paper, we formally model this problem and call it the pit stop problem. We design algorithms for this problem under various settings: single vs multiple types of stops, and offline vs online optimization (i.e., in advance of or during the trip). Our algorithms achieve provable guarantees in terms of approximating the optimal solution. We then extensively evaluate our algorithms on real world data and demonstrate that they significantly outperform baseline solutions.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124692379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Graph Topology Learning via Spectral Densification 基于谱密度的可扩展图拓扑学习
Yongyu Wang, Zhiqiang Zhao, Zhuo Feng
Graph learning plays an important role in many data mining and machine learning tasks, such as manifold learning, data representation and analysis, dimensionality reduction, data clustering, and visualization, etc. In this work, we introduce a highly-scalable spectral graph densification approach (GRASPEL) for graph topology learning from data. By limiting the precision matrix to be a graph-Laplacian-like matrix, our approach aims to learn sparse undirected graphs from potentially high-dimensional input data. A very unique property of the graphs learned by GRASPEL is that the spectral embedding (or approximate effective-resistance) distances on the graph will encode the similarities between the original input data points. By leveraging high-performance spectral methods, sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as manifold learning, spectral clustering (SC), and dimensionality reduction (DR).
图学习在许多数据挖掘和机器学习任务中发挥着重要作用,如流形学习、数据表示和分析、降维、数据聚类和可视化等。在这项工作中,我们引入了一种高度可扩展的谱图密度化方法(GRASPEL),用于从数据中学习图拓扑。通过将精度矩阵限制为类图拉普拉斯矩阵,我们的方法旨在从潜在的高维输入数据中学习稀疏无向图。GRASPEL学到的图的一个非常独特的性质是图上的谱嵌入(或近似有效电阻)距离将编码原始输入数据点之间的相似性。通过利用高性能谱方法,可以通过识别并将最关键的谱边包含到图中来学习稀疏但谱鲁棒的图。与之前最先进的图学习方法相比,GRASPEL具有更高的可扩展性,并且可以大幅提高各种数据挖掘和机器学习应用的计算效率和解决方案质量,例如流形学习,谱聚类(SC)和降维(DR)。
{"title":"Scalable Graph Topology Learning via Spectral Densification","authors":"Yongyu Wang, Zhiqiang Zhao, Zhuo Feng","doi":"10.1145/3488560.3498480","DOIUrl":"https://doi.org/10.1145/3488560.3498480","url":null,"abstract":"Graph learning plays an important role in many data mining and machine learning tasks, such as manifold learning, data representation and analysis, dimensionality reduction, data clustering, and visualization, etc. In this work, we introduce a highly-scalable spectral graph densification approach (GRASPEL) for graph topology learning from data. By limiting the precision matrix to be a graph-Laplacian-like matrix, our approach aims to learn sparse undirected graphs from potentially high-dimensional input data. A very unique property of the graphs learned by GRASPEL is that the spectral embedding (or approximate effective-resistance) distances on the graph will encode the similarities between the original input data points. By leveraging high-performance spectral methods, sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as manifold learning, spectral clustering (SC), and dimensionality reduction (DR).","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126669449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Diversified Query Generation Guided by Knowledge Graph 基于知识图谱的多样化查询生成
Xi Shen, Jiangjie Chen, Jiaze Chen, Chun Zeng, Yanghua Xiao
Relevant articles recommendation plays an important role in online news platforms. Directly displaying recalled articles by a search engine lacks a deep understanding of the article contents. Generating clickable queries, on the other hand, summarizes an article in various aspects, which can be henceforth utilized to better connect relevant articles. Most existing approaches for generating article queries, however, do not consider the diversity of queries or whether they are appealing enough, which are essential for boosting user experience and platform drainage. To this end, we propose a Knowledge-Enhanced Diversified QuerY Generator (KEDY), which leverages an external knowledge graph (KG) as guidance. We diversify the query generation with the information of semantic neighbors of the entities in articles. We further constrain the diversification process with entity popularity knowledge to build appealing queries that users may be more interested in. The information within KG is propagated towards more popular entities with popularity-guided graph attention. We collect a news-query dataset from the search logs of a real-world search engine. Extensive experiments demonstrate our proposed KEDY can generate more diversified and insightful related queries than several strong baselines.
相关文章推荐在网络新闻平台中扮演着重要的角色。搜索引擎直接显示召回的文章,缺乏对文章内容的深刻理解。生成可点击的查询,另一方面,总结了一篇文章的各个方面,可以用来更好地连接相关的文章。然而,大多数现有的生成文章查询的方法都没有考虑查询的多样性,或者它们是否足够吸引人,而这些对于提高用户体验和平台流量至关重要。为此,我们提出了一种利用外部知识图(KG)作为指导的知识增强多元化查询生成器(KEDY)。我们利用条目实体的语义邻居信息来实现查询生成的多样化。我们进一步用实体流行度知识约束多样化过程,以构建用户可能更感兴趣的有吸引力的查询。KG内的信息传播到更受欢迎的实体,并使用流行度引导图关注。我们从真实世界的搜索引擎的搜索日志中收集新闻查询数据集。大量的实验表明,我们提出的KEDY可以产生比几个强基线更多样化和更有洞察力的相关查询。
{"title":"Diversified Query Generation Guided by Knowledge Graph","authors":"Xi Shen, Jiangjie Chen, Jiaze Chen, Chun Zeng, Yanghua Xiao","doi":"10.1145/3488560.3498431","DOIUrl":"https://doi.org/10.1145/3488560.3498431","url":null,"abstract":"Relevant articles recommendation plays an important role in online news platforms. Directly displaying recalled articles by a search engine lacks a deep understanding of the article contents. Generating clickable queries, on the other hand, summarizes an article in various aspects, which can be henceforth utilized to better connect relevant articles. Most existing approaches for generating article queries, however, do not consider the diversity of queries or whether they are appealing enough, which are essential for boosting user experience and platform drainage. To this end, we propose a Knowledge-Enhanced Diversified QuerY Generator (KEDY), which leverages an external knowledge graph (KG) as guidance. We diversify the query generation with the information of semantic neighbors of the entities in articles. We further constrain the diversification process with entity popularity knowledge to build appealing queries that users may be more interested in. The information within KG is propagated towards more popular entities with popularity-guided graph attention. We collect a news-query dataset from the search logs of a real-world search engine. Extensive experiments demonstrate our proposed KEDY can generate more diversified and insightful related queries than several strong baselines.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123321151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Unsupervised Cross-Domain Adaptation for Response Selection Using Self-Supervised and Adversarial Training 基于自监督和对抗训练的无监督跨域适应反应选择
Jia Li, Chongyang Tao, Huang Hu, Can Xu, Yining Chen, Daxin Jiang
Recently, many neural context-response matching models have been developed for retrieval-based dialogue systems. Although existing models achieve impressive performance through learning on a large amount of in-domain parallel dialogue data, they usually perform worse in another new domain. How to transfer a response retrieval model trained in high-resource domains to other low-resource domains is a crucial problem for scalable dialogue systems. To this end, we investigate the unsupervised cross-domain adaptation for response selection when the target domain has no parallel dialogue data. Specifically, we propose a two-stage method to adapt a response selection model to a new domain using self-supervised and adversarial training based on pre-trained language models (PLMs). To efficiently incorporate domain awareness and target-domain knowledge to PLMs, we first design a self-supervised post-training procedure, including domain discrimination (DD) task, target-domain masked language model (MLM) task and target-domain next sentence prediction (NSP) task. Based on this, we further conduct the adversarial fine-tuning to empower the model to match the proper response with extracted domain-shared features as much as possible. Experimental results show that our proposed method achieves consistent and significant improvements on several cross-domain response selection datasets.
近年来,针对基于检索的对话系统开发了许多神经上下文响应匹配模型。尽管现有模型通过学习大量领域内并行对话数据获得了令人印象深刻的性能,但它们通常在另一个新领域中表现较差。如何将高资源域训练的响应检索模型转移到其他低资源域是可扩展对话系统的关键问题。为此,我们研究了目标域不存在平行对话数据时,响应选择的无监督跨域自适应。具体来说,我们提出了一种两阶段方法,利用基于预训练语言模型(PLMs)的自监督和对抗训练,使响应选择模型适应新的领域。为了有效地将领域感知和目标领域知识整合到plm中,我们首先设计了一个自监督的后训练过程,包括领域识别(DD)任务、目标领域掩模语言模型(MLM)任务和目标领域下一句话预测(NSP)任务。在此基础上,我们进一步进行对抗性微调,使模型能够尽可能地将适当的响应与提取的域共享特征匹配起来。实验结果表明,该方法在多个跨域响应选择数据集上取得了一致且显著的改进。
{"title":"Unsupervised Cross-Domain Adaptation for Response Selection Using Self-Supervised and Adversarial Training","authors":"Jia Li, Chongyang Tao, Huang Hu, Can Xu, Yining Chen, Daxin Jiang","doi":"10.1145/3488560.3498404","DOIUrl":"https://doi.org/10.1145/3488560.3498404","url":null,"abstract":"Recently, many neural context-response matching models have been developed for retrieval-based dialogue systems. Although existing models achieve impressive performance through learning on a large amount of in-domain parallel dialogue data, they usually perform worse in another new domain. How to transfer a response retrieval model trained in high-resource domains to other low-resource domains is a crucial problem for scalable dialogue systems. To this end, we investigate the unsupervised cross-domain adaptation for response selection when the target domain has no parallel dialogue data. Specifically, we propose a two-stage method to adapt a response selection model to a new domain using self-supervised and adversarial training based on pre-trained language models (PLMs). To efficiently incorporate domain awareness and target-domain knowledge to PLMs, we first design a self-supervised post-training procedure, including domain discrimination (DD) task, target-domain masked language model (MLM) task and target-domain next sentence prediction (NSP) task. Based on this, we further conduct the adversarial fine-tuning to empower the model to match the proper response with extracted domain-shared features as much as possible. Experimental results show that our proposed method achieves consistent and significant improvements on several cross-domain response selection datasets.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123338957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Personalized Information Retrieval for Touristic Attractions in Augmented Reality 基于增强现实的旅游景点个性化信息检索
Felix Yang, Saikishore Kalloori, Ribin Chalumattu, Markus Gross
The rapid advances and increasing accessibility of augmented reality (AR) in recent years opened up many new possibilities to incorporate AR into our daily lives. A very interesting area for AR is tourism where one can enhance attractions with virtual elements and provide tourists with additional information about the places they are visiting. In this paper, we present our prototype, an AR application that augments various points of interest (POIs) by showing images and facts about each POI. We also developed a simple recommender system that ensures the facts are selected based on user preferences, thus creating a unique and personalized experience for each user. Furthermore, we also conducted a live user study to assess the usability of our prototype and the usefulness of our personalization system.
近年来,增强现实(AR)的快速发展和日益普及,为将AR融入我们的日常生活开辟了许多新的可能性。增强现实的一个非常有趣的领域是旅游业,人们可以用虚拟元素增强景点,并为游客提供有关他们正在访问的地方的额外信息。在本文中,我们展示了我们的原型,一个AR应用程序,通过显示每个POI的图像和事实来增强各种兴趣点(POI)。我们还开发了一个简单的推荐系统,确保根据用户偏好选择事实,从而为每个用户创造独特和个性化的体验。此外,我们还进行了现场用户研究,以评估我们的原型的可用性和我们的个性化系统的有用性。
{"title":"Personalized Information Retrieval for Touristic Attractions in Augmented Reality","authors":"Felix Yang, Saikishore Kalloori, Ribin Chalumattu, Markus Gross","doi":"10.1145/3488560.3502194","DOIUrl":"https://doi.org/10.1145/3488560.3502194","url":null,"abstract":"The rapid advances and increasing accessibility of augmented reality (AR) in recent years opened up many new possibilities to incorporate AR into our daily lives. A very interesting area for AR is tourism where one can enhance attractions with virtual elements and provide tourists with additional information about the places they are visiting. In this paper, we present our prototype, an AR application that augments various points of interest (POIs) by showing images and facts about each POI. We also developed a simple recommender system that ensures the facts are selected based on user preferences, thus creating a unique and personalized experience for each user. Furthermore, we also conducted a live user study to assess the usability of our prototype and the usefulness of our personalization system.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116184347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1