首页 > 最新文献

Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

英文 中文
Learning from Heterogeneous Networks: Methods and Applications 从异构网络学习:方法和应用
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372182
Chuxu Zhang
Complex systems in different disciplines are usually modeled as heterogeneous networks. Different from homogeneous networks or attributed networks, heterogeneous networks are associated with complexity in heterogeneous structure or heterogeneous content or both. The abundant information in heterogeneous networks provide opportunities yet pose challenges for researchers and practitioners to develop customized machine learning solutions for solving different problems in complex systems. We are motivated to do significant work for learning from heterogeneous networks. In this paper, we first introduce the motivation and background of this research. Later, we present our current work which include a series of proposed methods and applications. These methods will be introduced in the perspectives of personalization in web-based systems and heterogeneous network embedding. In the end, we raise several research directions as future agenda.
不同学科的复杂系统通常被建模为异构网络。与同质网络或属性网络不同,异质网络与异质结构或异质内容的复杂性有关,或两者兼而有之。异构网络中丰富的信息为研究人员和实践者提供了机遇,但也提出了挑战,以开发定制的机器学习解决方案来解决复杂系统中的不同问题。我们被激励去做重要的工作,从异构网络中学习。本文首先介绍了本研究的动机和背景。随后,我们介绍了我们目前的工作,包括一系列提出的方法和应用。这些方法将从基于web的系统的个性化和异构网络嵌入的角度进行介绍。最后,提出了今后的研究方向。
{"title":"Learning from Heterogeneous Networks: Methods and Applications","authors":"Chuxu Zhang","doi":"10.1145/3336191.3372182","DOIUrl":"https://doi.org/10.1145/3336191.3372182","url":null,"abstract":"Complex systems in different disciplines are usually modeled as heterogeneous networks. Different from homogeneous networks or attributed networks, heterogeneous networks are associated with complexity in heterogeneous structure or heterogeneous content or both. The abundant information in heterogeneous networks provide opportunities yet pose challenges for researchers and practitioners to develop customized machine learning solutions for solving different problems in complex systems. We are motivated to do significant work for learning from heterogeneous networks. In this paper, we first introduce the motivation and background of this research. Later, we present our current work which include a series of proposed methods and applications. These methods will be introduced in the perspectives of personalization in web-based systems and heterogeneous network embedding. In the end, we raise several research directions as future agenda.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131950483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Multi-Graph Clustering via Attentive Cross-Graph Association 基于细心交叉图关联的深度多图聚类
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371806
Dongsheng Luo, Jingchao Ni, Suhang Wang, Yuchen Bian, Xiong Yu, Xiang Zhang
Multi-graph clustering aims to improve clustering accuracy by leveraging information from different domains, which has been shown to be extremely effective for achieving better clustering results than single graph based clustering algorithms. Despite the previous success, existing multi-graph clustering methods mostly use shallow models, which are incapable to capture the highly non-linear structures and the complex cluster associations in multi-graph, thus result in sub-optimal results. Inspired by the powerful representation learning capability of neural networks, in this paper, we propose an end-to-end deep learning model to simultaneously infer cluster assignments and cluster associations in multi-graph. Specifically, we use autoencoding networks to learn node embeddings. Meanwhile, we propose a minimum-entropy based clustering strategy to cluster nodes in the embedding space for each graph. We introduce two regularizers to leverage both within-graph and cross-graph dependencies. An attentive mechanism is further developed to learn cross-graph cluster associations. Through extensive experiments on a variety of datasets, we observe that our method outperforms state-of-the-art baselines by a large margin.
多图聚类旨在通过利用不同领域的信息来提高聚类精度,与基于单图的聚类算法相比,多图聚类的聚类效果非常好。尽管已有的多图聚类方法取得了成功,但现有的多图聚类方法大多使用浅模型,无法捕捉多图中高度非线性的结构和复杂的聚类关联,从而导致次优结果。受神经网络强大的表示学习能力的启发,本文提出了一种端到端深度学习模型,可以同时推断多图中的聚类分配和聚类关联。具体来说,我们使用自动编码网络来学习节点嵌入。同时,我们提出了一种基于最小熵的聚类策略,对每个图的嵌入空间中的节点进行聚类。我们引入两个正则器来利用图内和图间依赖关系。进一步开发了一种关注机制来学习跨图聚类关联。通过对各种数据集的广泛实验,我们观察到我们的方法在很大程度上优于最先进的基线。
{"title":"Deep Multi-Graph Clustering via Attentive Cross-Graph Association","authors":"Dongsheng Luo, Jingchao Ni, Suhang Wang, Yuchen Bian, Xiong Yu, Xiang Zhang","doi":"10.1145/3336191.3371806","DOIUrl":"https://doi.org/10.1145/3336191.3371806","url":null,"abstract":"Multi-graph clustering aims to improve clustering accuracy by leveraging information from different domains, which has been shown to be extremely effective for achieving better clustering results than single graph based clustering algorithms. Despite the previous success, existing multi-graph clustering methods mostly use shallow models, which are incapable to capture the highly non-linear structures and the complex cluster associations in multi-graph, thus result in sub-optimal results. Inspired by the powerful representation learning capability of neural networks, in this paper, we propose an end-to-end deep learning model to simultaneously infer cluster assignments and cluster associations in multi-graph. Specifically, we use autoencoding networks to learn node embeddings. Meanwhile, we propose a minimum-entropy based clustering strategy to cluster nodes in the embedding space for each graph. We introduce two regularizers to leverage both within-graph and cross-graph dependencies. An attentive mechanism is further developed to learn cross-graph cluster associations. Through extensive experiments on a variety of datasets, we observe that our method outperforms state-of-the-art baselines by a large margin.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130346176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
WebShapes: Network Visualization with 3D Shapes WebShapes:具有3D形状的网络可视化
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371867
Shengmin Jin, Richard Wituszynski, Max Caiello-Gingold, R. Zafarani
Network visualization has played a critical role in graph analysis, as it not only presents a big picture of a network but also helps reveal the structural information of a network. The most popular visual representation of networks is the node-link diagram. However, visualizing a large network with the node-link diagram can be challenging due to the difficulty in obtaining an optimal graph layout. To address this challenge, a recent advancement in network representation: network shape, allows one to compactly represent a network and its subgraphs with the distribution of their embeddings. Inspired by this research, we have designed a web platform WebShapes that enables researchers and practitioners to visualize their network data as customized 3D shapes (http://b.link/webshapes). Furthermore, we provide a case study on real-world networks to explore the sensitivity of network shapes to different graph sampling, embedding, and fitting methods, and we show examples of understanding networks through their network shapes.
网络可视化在图分析中起着至关重要的作用,因为它不仅能呈现网络的全貌,而且有助于揭示网络的结构信息。最流行的网络可视化表示是节点链接图。然而,由于难以获得最佳的图布局,使用节点链接图可视化大型网络可能具有挑战性。为了应对这一挑战,网络表示的最新进展:网络形状,允许人们用嵌入的分布紧凑地表示网络及其子图。受到这项研究的启发,我们设计了一个网络平台WebShapes,使研究人员和从业者能够将他们的网络数据可视化为定制的3D形状(http://b.link/webshapes)。此外,我们提供了一个现实世界网络的案例研究,以探索网络形状对不同图采样、嵌入和拟合方法的敏感性,并展示了通过网络形状理解网络的示例。
{"title":"WebShapes: Network Visualization with 3D Shapes","authors":"Shengmin Jin, Richard Wituszynski, Max Caiello-Gingold, R. Zafarani","doi":"10.1145/3336191.3371867","DOIUrl":"https://doi.org/10.1145/3336191.3371867","url":null,"abstract":"Network visualization has played a critical role in graph analysis, as it not only presents a big picture of a network but also helps reveal the structural information of a network. The most popular visual representation of networks is the node-link diagram. However, visualizing a large network with the node-link diagram can be challenging due to the difficulty in obtaining an optimal graph layout. To address this challenge, a recent advancement in network representation: network shape, allows one to compactly represent a network and its subgraphs with the distribution of their embeddings. Inspired by this research, we have designed a web platform WebShapes that enables researchers and practitioners to visualize their network data as customized 3D shapes (http://b.link/webshapes). Furthermore, we provide a case study on real-world networks to explore the sensitivity of network shapes to different graph sampling, embedding, and fitting methods, and we show examples of understanding networks through their network shapes.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130668760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User Recommendation in Content Curation Platforms 内容管理平台中的用户推荐
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371822
Jianling Wang, Ziwei Zhu, James Caverlee
We propose a personalized user recommendation framework for content curation platforms that models preferences for both users and the items they engage with simultaneously. In this way, user preferences for specific item types (e.g., fantasy novels) can be balanced with user specialties (e.g., reviewing novels with strong female protagonists). In particular, the proposed model has three unique characteristics: (i) it simultaneously learns both user-item and user-user preferences through a multi-aspect autoencoder model; (ii) it fuses the latent representations of user preferences on users and items to construct shared factors through an adversarial framework; and (iii) it incorporates an attention layer to produce weighted aggregations of different latent representations, leading to improved personalized recommendation of users and items. Through experiments against state-of-the-art models, we find the proposed framework leads to a 18.43% (Goodreads) and 6.14% (Spotify) improvement in top-k user recommendation.
我们为内容管理平台提出了一个个性化的用户推荐框架,该框架可以模拟用户和他们同时参与的项目的偏好。通过这种方式,用户对特定道具类型(如奇幻小说)的偏好可以与用户特长(如评论具有强大女性主角的小说)相平衡。特别地,所提出的模型具有三个独特的特征:(i)它通过一个多面向的自编码器模型同时学习用户-项目和用户-用户偏好;(ii)通过对抗性框架融合用户偏好对用户和项目的潜在表征,构建共享因素;(iii)它结合了一个关注层来产生不同潜在表示的加权聚合,从而改进了用户和项目的个性化推荐。通过对最先进模型的实验,我们发现所提出的框架在top-k用户推荐方面提高了18.43% (Goodreads)和6.14% (Spotify)。
{"title":"User Recommendation in Content Curation Platforms","authors":"Jianling Wang, Ziwei Zhu, James Caverlee","doi":"10.1145/3336191.3371822","DOIUrl":"https://doi.org/10.1145/3336191.3371822","url":null,"abstract":"We propose a personalized user recommendation framework for content curation platforms that models preferences for both users and the items they engage with simultaneously. In this way, user preferences for specific item types (e.g., fantasy novels) can be balanced with user specialties (e.g., reviewing novels with strong female protagonists). In particular, the proposed model has three unique characteristics: (i) it simultaneously learns both user-item and user-user preferences through a multi-aspect autoencoder model; (ii) it fuses the latent representations of user preferences on users and items to construct shared factors through an adversarial framework; and (iii) it incorporates an attention layer to produce weighted aggregations of different latent representations, leading to improved personalized recommendation of users and items. Through experiments against state-of-the-art models, we find the proposed framework leads to a 18.43% (Goodreads) and 6.14% (Spotify) improvement in top-k user recommendation.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127509683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Bayesian Data Mining 深度贝叶斯数据挖掘
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371870
Jen-Tzung Chien
This tutorial addresses the fundamentals and advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, "deep learning" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The "semantic structure" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The "distribution function" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network (RNN), long short-term memory, sequence-to-sequence model, variational auto-encoder (VAE), generative adversarial network (GAN), attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, stochastic neural network, stochastic temporal convolutional network, predictive state neural network, and policy neural network. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies, tasks and applications are presented to tackle different issues in deep Bayesian mining, searching, learning and understanding. At last, we will point out a number of directions and outlooks for future studies. This tutorial serves the objectives to introduce novices to major topics within deep Bayesian learning, motivate and explain a topic of emerging importance for data mining and natural language understanding, and present a novel synthesis combining distinct lines of machine learning work.
本教程介绍了自然语言深度贝叶斯挖掘和学习的基本原理和进展,其广泛应用包括语音识别、文档摘要、文本分类、文本分割、信息提取、图像标题生成、句子生成、对话控制、情感分类、推荐系统、问答和机器翻译等。传统上,“深度学习”被认为是一个基于实值确定性模型的推理或优化的学习过程。从大量词汇中提取的单词、句子、实体、动作和文档中的“语义结构”在数学逻辑或计算机程序中可能无法很好地表达或正确优化。自然语言的离散或连续潜变量模型中的“分布函数”可能无法正确分解或估计。本教程介绍了统计模型和神经网络的基础知识,并重点介绍了一系列高级贝叶斯模型和深度模型,包括分层Dirichlet过程、中餐馆过程、分层Pitman-Yor过程、印度自助餐过程、循环神经网络(RNN)、长短期记忆、序列到序列模型、变分自编码器(VAE)、生成对抗网络(GAN)、注意机制、记忆增强神经网络、跳跃神经网络,时间差分VAE,随机神经网络,随机时间卷积网络,预测状态神经网络,以及策略神经网络。增强先验/后验表示是解决。我们介绍了这些模型是如何连接的,以及为什么它们适用于自然语言中符号和复杂模式的各种应用。针对复杂模型的优化问题,提出了变分推理和抽样方法。单词和句子嵌入、聚类和共聚类与语言和语义约束相结合。提出了一系列的案例研究、任务和应用,以解决深度贝叶斯挖掘、搜索、学习和理解中的不同问题。最后,提出了今后研究的方向和展望。本教程的目的是向新手介绍深度贝叶斯学习中的主要主题,激发和解释数据挖掘和自然语言理解中新兴的重要主题,并展示结合不同机器学习工作线的新颖综合。
{"title":"Deep Bayesian Data Mining","authors":"Jen-Tzung Chien","doi":"10.1145/3336191.3371870","DOIUrl":"https://doi.org/10.1145/3336191.3371870","url":null,"abstract":"This tutorial addresses the fundamentals and advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, \"deep learning\" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The \"semantic structure\" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The \"distribution function\" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network (RNN), long short-term memory, sequence-to-sequence model, variational auto-encoder (VAE), generative adversarial network (GAN), attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, stochastic neural network, stochastic temporal convolutional network, predictive state neural network, and policy neural network. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies, tasks and applications are presented to tackle different issues in deep Bayesian mining, searching, learning and understanding. At last, we will point out a number of directions and outlooks for future studies. This tutorial serves the objectives to introduce novices to major topics within deep Bayesian learning, motivate and explain a topic of emerging importance for data mining and natural language understanding, and present a novel synthesis combining distinct lines of machine learning work.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129738102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LARA: Attribute-to-feature Adversarial Learning for New-item Recommendation 针对新项目推荐的属性-特征对抗性学习
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371805
Changfeng Sun, Han Liu, Meng Liu, Z. Ren, Tian Gan, Liqiang Nie
Recommending new items in real-world e-commerce portals is a challenging problem as the cold start phenomenon, i.e., lacks of user-item interactions. To address this problem, we propose a novel recommendation model, i.e., adversarial neural network with multiple generators, to generate users from multiple perspectives of items' attributes. Namely, the generated users are represented by attribute-level features. As both users and items are attribute-level representations, we can implicitly obtain user-item attribute-level interaction information. In light of this, the new item can be recommended to users based on attribute-level similarity. Extensive experimental results on two item cold-start scenarios, movie and goods recommendation, verify the effectiveness of our proposed model as compared to state-of-the-art baselines.
在现实世界的电子商务门户网站中,推荐新商品是一个具有挑战性的问题,因为存在冷启动现象,即缺乏用户与商品的交互。为了解决这个问题,我们提出了一种新的推荐模型,即具有多个生成器的对抗神经网络,从项目属性的多个角度生成用户。也就是说,生成的用户由属性级特征表示。由于用户和项目都是属性级表示,我们可以隐式地获得用户-项目属性级交互信息。因此,可以根据属性级相似性向用户推荐新项目。与最先进的基线相比,在电影和商品推荐两种项目冷启动场景下的大量实验结果验证了我们提出的模型的有效性。
{"title":"LARA: Attribute-to-feature Adversarial Learning for New-item Recommendation","authors":"Changfeng Sun, Han Liu, Meng Liu, Z. Ren, Tian Gan, Liqiang Nie","doi":"10.1145/3336191.3371805","DOIUrl":"https://doi.org/10.1145/3336191.3371805","url":null,"abstract":"Recommending new items in real-world e-commerce portals is a challenging problem as the cold start phenomenon, i.e., lacks of user-item interactions. To address this problem, we propose a novel recommendation model, i.e., adversarial neural network with multiple generators, to generate users from multiple perspectives of items' attributes. Namely, the generated users are represented by attribute-level features. As both users and items are attribute-level representations, we can implicitly obtain user-item attribute-level interaction information. In light of this, the new item can be recommended to users based on attribute-level similarity. Extensive experimental results on two item cold-start scenarios, movie and goods recommendation, verify the effectiveness of our proposed model as compared to state-of-the-art baselines.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129970139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Parameter Tuning in Personal Search Systems 个人搜索系统中的参数调整
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371820
S. Chen, Xuanhui Wang, Zhen Qin, Donald Metzler
Retrieval effectiveness in information retrieval systems is heavily dependent on how various parameters are tuned. One option to find these parameters is to run multiple online experiments and using a parameter sweep approach in order to optimize the search system. There are multiple downsides of this approach, mainly that it may lead to a poor experience for users. Another option is to do offline evaluation, which can act as a safeguard against potential quality issues. Offline evaluation requires a validation set of data that can be benchmarked against different parameter settings. However, for search over personal corpora, e.g. email and file search, it is impractical and often impossible to get a complete representative validation set, due to the inability to save raw queries and document information. In this work, we show how to do offline parameter tuning with only a partial validation set. In addition, we demonstrate how to do parameter tuning in the cases when we have complete knowledge of the internal implementation of the search system (white-box tuning), as well as the case where we have only partial knowledge (grey-box tuning). This has allowed us to do offline parameter tuning in a privacy-sensitive manner.
信息检索系统的检索效率很大程度上取决于各种参数的调优方式。找到这些参数的一种选择是运行多个在线实验,并使用参数扫描方法来优化搜索系统。这种方法有很多缺点,主要是它可能会给用户带来糟糕的体验。另一个选择是进行离线评估,这可以作为对潜在质量问题的保障。离线评估需要一组验证数据,这些数据可以针对不同的参数设置进行基准测试。然而,对于个人语料库的搜索,例如电子邮件和文件搜索,由于无法保存原始查询和文档信息,获得完整的代表性验证集是不切实际的,而且通常是不可能的。在本文中,我们将展示如何仅使用部分验证集进行离线参数调优。此外,我们还演示了在我们完全了解搜索系统的内部实现(白盒调优)以及我们只有部分知识(灰盒调优)的情况下如何进行参数调优。这使我们能够以隐私敏感的方式进行离线参数调优。
{"title":"Parameter Tuning in Personal Search Systems","authors":"S. Chen, Xuanhui Wang, Zhen Qin, Donald Metzler","doi":"10.1145/3336191.3371820","DOIUrl":"https://doi.org/10.1145/3336191.3371820","url":null,"abstract":"Retrieval effectiveness in information retrieval systems is heavily dependent on how various parameters are tuned. One option to find these parameters is to run multiple online experiments and using a parameter sweep approach in order to optimize the search system. There are multiple downsides of this approach, mainly that it may lead to a poor experience for users. Another option is to do offline evaluation, which can act as a safeguard against potential quality issues. Offline evaluation requires a validation set of data that can be benchmarked against different parameter settings. However, for search over personal corpora, e.g. email and file search, it is impractical and often impossible to get a complete representative validation set, due to the inability to save raw queries and document information. In this work, we show how to do offline parameter tuning with only a partial validation set. In addition, we demonstrate how to do parameter tuning in the cases when we have complete knowledge of the internal implementation of the search system (white-box tuning), as well as the case where we have only partial knowledge (grey-box tuning). This has allowed us to do offline parameter tuning in a privacy-sensitive manner.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128874069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adversarial Machine Learning in Recommender Systems (AML-RecSys) 推荐系统中的对抗性机器学习(AML-RecSys)
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371877
Yashar Deldjoo, T. D. Noia, Felice Antonio Merra
Recommender systems (RS) are an integral part of many online services aiming to provide an enhanced user-oriented experience. Machine learning (ML) models are nowadays broadly adopted in modern state-of-the-art approaches to recommendation, which are typically trained to maximize a user-centred utility (e.g., user satisfaction) or a business-oriented one (e.g., profitability or sales increase). They work under the main assumption that users' historical feedback can serve as proper ground-truth for model training and evaluation. However, driven by the success in the ML community, recent advances show that state-of-the-art recommendation approaches such as matrix factorization (MF) models or the ones based on deep neural networks can be vulnerable to adversarial perturbations applied on the input data. These adversarial samples can impede the ability for training high-quality MF models and can put the driven success of these approaches at high risk. As a result, there is a new paradigm of secure training for RS that takes into account the presence of adversarial samples into the recommendation process. We present adversarial machine learning in Recommender Systems (AML-RecSys), which concerns the study of effective ML techniques in RS to fight against an adversarial component. AML-RecSys has been proposed in two main fashions within the RS literature: (i) adversarial regularization, which attempts to combat against adversarial perturbation added to input data or model parameters of a RS and, (ii) generative adversarial network (GAN)-based models, which adopt a generative process to train powerful ML models. We discuss a theoretical framework to unify the two above models, which is performed via a minimax game between an adversarial component and a discriminator. Furthermore, we explore various examples illustrating the successful application of AML to solve various RS tasks. Finally, we present a global taxonomy/overview of the academic literature based on several identified dimensions, namely (i) research goals and challenges, (ii) application domains and (iii) technical overview.
推荐系统(RS)是许多在线服务不可或缺的一部分,旨在提供增强的面向用户的体验。如今,机器学习(ML)模型被广泛应用于现代最先进的推荐方法中,这些方法通常被训练为最大化以用户为中心的效用(例如,用户满意度)或以业务为导向的效用(例如,盈利能力或销售增长)。它们的主要假设是,用户的历史反馈可以作为模型训练和评估的正确基础。然而,在机器学习社区成功的推动下,最近的进展表明,最先进的推荐方法,如矩阵分解(MF)模型或基于深度神经网络的推荐方法,可能容易受到应用于输入数据的对抗性扰动的影响。这些对抗性样本可能会阻碍训练高质量MF模型的能力,并可能使这些方法的成功处于高风险之中。因此,在推荐过程中考虑到对抗性样本的存在,出现了一种新的RS安全训练范例。我们在推荐系统(AML-RecSys)中提出了对抗性机器学习,它涉及在RS中研究有效的机器学习技术来对抗对抗性组件。AML-RecSys在RS文献中以两种主要方式提出:(i)对抗性正则化,试图对抗添加到RS输入数据或模型参数中的对抗性扰动;(ii)基于生成对抗网络(GAN)的模型,采用生成过程来训练强大的ML模型。我们讨论了一个统一上述两个模型的理论框架,该框架通过对抗组件和鉴别器之间的极大极小博弈来实现。此外,我们探讨了各种例子,说明AML成功应用于解决各种RS任务。最后,我们根据几个确定的维度,即(i)研究目标和挑战,(ii)应用领域和(iii)技术概述,对学术文献进行了全球分类/概述。
{"title":"Adversarial Machine Learning in Recommender Systems (AML-RecSys)","authors":"Yashar Deldjoo, T. D. Noia, Felice Antonio Merra","doi":"10.1145/3336191.3371877","DOIUrl":"https://doi.org/10.1145/3336191.3371877","url":null,"abstract":"Recommender systems (RS) are an integral part of many online services aiming to provide an enhanced user-oriented experience. Machine learning (ML) models are nowadays broadly adopted in modern state-of-the-art approaches to recommendation, which are typically trained to maximize a user-centred utility (e.g., user satisfaction) or a business-oriented one (e.g., profitability or sales increase). They work under the main assumption that users' historical feedback can serve as proper ground-truth for model training and evaluation. However, driven by the success in the ML community, recent advances show that state-of-the-art recommendation approaches such as matrix factorization (MF) models or the ones based on deep neural networks can be vulnerable to adversarial perturbations applied on the input data. These adversarial samples can impede the ability for training high-quality MF models and can put the driven success of these approaches at high risk. As a result, there is a new paradigm of secure training for RS that takes into account the presence of adversarial samples into the recommendation process. We present adversarial machine learning in Recommender Systems (AML-RecSys), which concerns the study of effective ML techniques in RS to fight against an adversarial component. AML-RecSys has been proposed in two main fashions within the RS literature: (i) adversarial regularization, which attempts to combat against adversarial perturbation added to input data or model parameters of a RS and, (ii) generative adversarial network (GAN)-based models, which adopt a generative process to train powerful ML models. We discuss a theoretical framework to unify the two above models, which is performed via a minimax game between an adversarial component and a discriminator. Furthermore, we explore various examples illustrating the successful application of AML to solve various RS tasks. Finally, we present a global taxonomy/overview of the academic literature based on several identified dimensions, namely (i) research goals and challenges, (ii) application domains and (iii) technical overview.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
AutoBlock
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371813
Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Dong, Christos Faloutsos, Davd Page
Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.
{"title":"AutoBlock","authors":"Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Dong, Christos Faloutsos, Davd Page","doi":"10.1145/3336191.3371813","DOIUrl":"https://doi.org/10.1145/3336191.3371813","url":null,"abstract":"Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Influence Maximization in Attributed Social Network Based on Sampling 基于抽样的属性社会网络平衡影响最大化
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371833
Mingkai Lin, Wenzhong Li, Sanglu Lu
Influence maximization in social networks is the problem of finding a set of seed nodes in the network that maximizes the spread of influence under certain information prorogation model, which has become an important topic in social network analysis. In this paper, we show that conventional influence maximization algorithms cause uneven spread of influence among different attribute groups in social networks, which could lead to severer bias in public opinion dissemination and viral marketing. We formulate the balanced influence maximization problem to address the trade-off between influence maximization and attribute balance, and propose a sampling based solution to solve the problem efficiently. To avoid full network exploration, we first propose an attribute-based (AB) sampling method to sample attributed social networks with respect to preserving network structural properties and attribute proportion among user groups. Then we propose an attributed-based reverse influence sampling (AB-RIS) algorithm to select seed nodes from the sampled graph. The proposed AB-RIS algorithm runs efficiently with guaranteed accuracy, and achieves the trade-off between influence maximization and attribute balance. Extensive experiments based on four real-world social network datasets show that AB-RIS significantly outperforms the state-of-the-art approaches in balanced influence maximization.
社交网络中的影响力最大化问题是在一定的信息延拓模型下,在网络中找到一组影响传播最大化的种子节点,成为社会网络分析中的一个重要课题。在本文中,我们发现传统的影响力最大化算法导致社交网络中不同属性群体的影响力传播不均匀,这可能导致民意传播和病毒式营销的严重偏见。为了解决影响最大化和属性平衡之间的权衡,我们提出了平衡影响最大化问题,并提出了一种基于采样的解决方案来有效地解决问题。为了避免对整个网络进行探索,我们首先提出了一种基于属性(AB)的采样方法来对具有属性的社交网络进行采样,同时保留了网络的结构属性和属性在用户群体中的比例。然后,我们提出了一种基于属性的反向影响采样(AB-RIS)算法,从采样图中选择种子节点。所提出的AB-RIS算法在保证精度的前提下高效运行,实现了影响最大化和属性平衡之间的权衡。基于四个真实社会网络数据集的广泛实验表明,AB-RIS在平衡影响最大化方面显着优于最先进的方法。
{"title":"Balanced Influence Maximization in Attributed Social Network Based on Sampling","authors":"Mingkai Lin, Wenzhong Li, Sanglu Lu","doi":"10.1145/3336191.3371833","DOIUrl":"https://doi.org/10.1145/3336191.3371833","url":null,"abstract":"Influence maximization in social networks is the problem of finding a set of seed nodes in the network that maximizes the spread of influence under certain information prorogation model, which has become an important topic in social network analysis. In this paper, we show that conventional influence maximization algorithms cause uneven spread of influence among different attribute groups in social networks, which could lead to severer bias in public opinion dissemination and viral marketing. We formulate the balanced influence maximization problem to address the trade-off between influence maximization and attribute balance, and propose a sampling based solution to solve the problem efficiently. To avoid full network exploration, we first propose an attribute-based (AB) sampling method to sample attributed social networks with respect to preserving network structural properties and attribute proportion among user groups. Then we propose an attributed-based reverse influence sampling (AB-RIS) algorithm to select seed nodes from the sampled graph. The proposed AB-RIS algorithm runs efficiently with guaranteed accuracy, and achieves the trade-off between influence maximization and attribute balance. Extensive experiments based on four real-world social network datasets show that AB-RIS significantly outperforms the state-of-the-art approaches in balanced influence maximization.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122084493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 13th International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1