首页 > 最新文献

Proceedings of The Web Conference 2020最新文献

英文 中文
Deconstructing Google’s Web Light Service 解构b谷歌的Web Light Service
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380168
Ammar Tahir, Muhammad Tahir Munir, Shaiq Munir Malik, Z. Qazi, I. Qazi
Web Light is a transcoding service introduced by Google to show lighter and faster webpages to users searching on slow mobile clients. The service detects slow clients (e.g., users on 2G) and tries to convert webpages on the fly into a version optimized for these clients. Web Light claims to significantly reduce page load times, save user data, and substantially increase traffic to such webpages. However, there are several concerns around this service, including, its effectiveness in, preserving relevant content on a page, showing third-party advertisements, improving user performance as well as privacy concerns for users and publishers. In this paper, we perform the first independent, empirical analysis of Google’s Web Light service to shed light on these concerns. Through a combination of experiments with thousands of real Web Light pages as well as controlled experiments with synthetic Web Light pages, we (i) deconstruct how Web Light modifies webpages, (ii) investigate how ads are shown on Web Light and which ad networks are supported, (iii) measure and compare Web Light’s page load performance, (iv) discuss privacy concerns for users and publishers and (v) investigate the potential use of Web Light as a censorship circumvention tool.
Web Light是谷歌推出的一项转码服务,为在缓慢的移动客户端上搜索的用户显示更轻、更快的网页。该服务检测速度较慢的客户端(例如2G用户),并尝试将网页动态转换为针对这些客户端优化的版本。Web Light声称可以显著减少页面加载时间,节省用户数据,并大大增加此类网页的流量。然而,围绕这项服务存在一些问题,包括它在保留页面上相关内容、显示第三方广告、提高用户性能以及用户和发布者的隐私问题方面的有效性。在本文中,我们对Google的Web Light服务进行了首次独立的实证分析,以阐明这些问题。通过对数千个真实Web Light页面的实验,以及对合成Web Light页面的对照实验,我们(i)解构Web Light如何修改网页,(ii)调查广告如何在Web Light上显示,以及支持哪些广告网络,(iii)测量和比较Web Light的页面加载性能,(iv)讨论用户和出版商的隐私问题,以及(v)调查Web Light作为审查规避工具的潜在用途。
{"title":"Deconstructing Google’s Web Light Service","authors":"Ammar Tahir, Muhammad Tahir Munir, Shaiq Munir Malik, Z. Qazi, I. Qazi","doi":"10.1145/3366423.3380168","DOIUrl":"https://doi.org/10.1145/3366423.3380168","url":null,"abstract":"Web Light is a transcoding service introduced by Google to show lighter and faster webpages to users searching on slow mobile clients. The service detects slow clients (e.g., users on 2G) and tries to convert webpages on the fly into a version optimized for these clients. Web Light claims to significantly reduce page load times, save user data, and substantially increase traffic to such webpages. However, there are several concerns around this service, including, its effectiveness in, preserving relevant content on a page, showing third-party advertisements, improving user performance as well as privacy concerns for users and publishers. In this paper, we perform the first independent, empirical analysis of Google’s Web Light service to shed light on these concerns. Through a combination of experiments with thousands of real Web Light pages as well as controlled experiments with synthetic Web Light pages, we (i) deconstruct how Web Light modifies webpages, (ii) investigate how ads are shown on Web Light and which ad networks are supported, (iii) measure and compare Web Light’s page load performance, (iv) discuss privacy concerns for users and publishers and (v) investigate the potential use of Web Light as a censorship circumvention tool.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85962422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Dynamic Composition for Conversational Domain Exploration 会话领域探索的动态组合
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380167
Idan Szpektor, Deborah Cohen, G. Elidan, Michael Fink, A. Hassidim, Orgad Keller, Sayalı, Kulkarni, E. Ofek, S. Pudinsky, Asaf Revach, Shimi Salant
We study conversational domain exploration (CODEX), where the user’s goal is to enrich her knowledge of a given domain by conversing with an informative bot. Such conversations should be well grounded in high-quality domain knowledge as well as engaging and open-ended. A CODEX bot should be proactive and introduce relevant information even if not directly asked for by the user. The bot should also appropriately pivot the conversation to undiscovered regions of the domain. To address these dialogue characteristics, we introduce a novel approach termed dynamic composition that decouples candidate content generation from the flexible composition of bot responses. This allows the bot to control the source, correctness and quality of the offered content, while achieving flexibility via a dialogue manager that selects the most appropriate contents in a compositional manner. We implemented a CODEX bot based on dynamic composition and integrated it into the Google Assistant . As an example domain, the bot conversed about the NBA basketball league in a seamless experience, such that users were not aware whether they were conversing with the vanilla system or the one augmented with our CODEX bot. Results are positive and offer insights into what makes for a good conversation. To the best of our knowledge, this is the first real user experiment of open-ended dialogues as part of a commercial assistant system.
我们研究会话领域探索(CODEX),其中用户的目标是通过与信息型机器人交谈来丰富她对给定领域的知识。这样的对话应该以高质量的领域知识为基础,并且具有吸引力和开放性。食品法典机器人应积极主动,即使用户没有直接要求,也应介绍相关信息。机器人还应该适当地将对话转向域的未被发现的区域。为了解决这些对话特征,我们引入了一种称为动态组合的新方法,该方法将候选内容生成与机器人响应的灵活组合解耦。这允许机器人控制所提供内容的来源、正确性和质量,同时通过对话管理器实现灵活性,以组合的方式选择最合适的内容。我们实现了一个基于动态合成的CODEX机器人,并将其集成到Google Assistant中。作为一个示例域,机器人在无缝体验中谈论NBA篮球联赛,这样用户就不知道他们是在与香草系统交谈还是与我们的CODEX机器人增强的系统交谈。结果是积极的,并提供了如何进行良好对话的见解。据我们所知,这是作为商业辅助系统一部分的开放式对话的第一个真正的用户实验。
{"title":"Dynamic Composition for Conversational Domain Exploration","authors":"Idan Szpektor, Deborah Cohen, G. Elidan, Michael Fink, A. Hassidim, Orgad Keller, Sayalı, Kulkarni, E. Ofek, S. Pudinsky, Asaf Revach, Shimi Salant","doi":"10.1145/3366423.3380167","DOIUrl":"https://doi.org/10.1145/3366423.3380167","url":null,"abstract":"We study conversational domain exploration (CODEX), where the user’s goal is to enrich her knowledge of a given domain by conversing with an informative bot. Such conversations should be well grounded in high-quality domain knowledge as well as engaging and open-ended. A CODEX bot should be proactive and introduce relevant information even if not directly asked for by the user. The bot should also appropriately pivot the conversation to undiscovered regions of the domain. To address these dialogue characteristics, we introduce a novel approach termed dynamic composition that decouples candidate content generation from the flexible composition of bot responses. This allows the bot to control the source, correctness and quality of the offered content, while achieving flexibility via a dialogue manager that selects the most appropriate contents in a compositional manner. We implemented a CODEX bot based on dynamic composition and integrated it into the Google Assistant . As an example domain, the bot conversed about the NBA basketball league in a seamless experience, such that users were not aware whether they were conversing with the vanilla system or the one augmented with our CODEX bot. Results are positive and offer insights into what makes for a good conversation. To the best of our knowledge, this is the first real user experiment of open-ended dialogues as part of a commercial assistant system.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80124187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Natural Language Annotations for Search Engine Optimization 搜索引擎优化的自然语言注释
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380049
P. Jenkins, Jennifer Zhao, Heath Vinicombe, Anant Subramanian, Arun Prasad, Atillia Dobi, E. Li, Yunsong Guo
Understanding content at scale is a difficult but important problem for many platforms. Many previous studies focus on content understanding to optimize engagement with existing users. However, little work studies how to leverage better content understanding to attract new users. In this work, we build a framework for generating natural language content annotations and show how they can be used for search engine optimization. The proposed framework relies on an XGBoost model that labels “pins” with high probability phrases, and a logistic regression layer that learns to rank aggregated annotations for groups of content. The pipeline identifies keywords that are descriptive and contextually meaningful. We perform a large-scale production experiment deployed on the Pinterest platform and show that natural language annotations cause a 1-2% increase in traffic from leading search engines. This increase is statistically significant. Finally, we explore and interpret the characteristics of our annotations framework.
对许多平台来说,大规模理解内容是一个困难但重要的问题。许多先前的研究都关注于内容理解,以优化与现有用户的互动。然而,很少有人研究如何利用更好的内容理解来吸引新用户。在这项工作中,我们构建了一个用于生成自然语言内容注释的框架,并展示了如何将它们用于搜索引擎优化。提出的框架依赖于一个XGBoost模型,该模型用高概率短语标记“pin”,以及一个逻辑回归层,该层学习对内容组的聚合注释进行排序。管道标识具有描述性和上下文意义的关键字。我们在Pinterest平台上进行了大规模的生产实验,并表明自然语言注释导致领先搜索引擎的流量增加1-2%。这一增长在统计上是显著的。最后,我们探索和解释了我们的注释框架的特点。
{"title":"Natural Language Annotations for Search Engine Optimization","authors":"P. Jenkins, Jennifer Zhao, Heath Vinicombe, Anant Subramanian, Arun Prasad, Atillia Dobi, E. Li, Yunsong Guo","doi":"10.1145/3366423.3380049","DOIUrl":"https://doi.org/10.1145/3366423.3380049","url":null,"abstract":"Understanding content at scale is a difficult but important problem for many platforms. Many previous studies focus on content understanding to optimize engagement with existing users. However, little work studies how to leverage better content understanding to attract new users. In this work, we build a framework for generating natural language content annotations and show how they can be used for search engine optimization. The proposed framework relies on an XGBoost model that labels “pins” with high probability phrases, and a logistic regression layer that learns to rank aggregated annotations for groups of content. The pipeline identifies keywords that are descriptive and contextually meaningful. We perform a large-scale production experiment deployed on the Pinterest platform and show that natural language annotations cause a 1-2% increase in traffic from leading search engines. This increase is statistically significant. Finally, we explore and interpret the characteristics of our annotations framework.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76820148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multi-Context Attention for Entity Matching 实体匹配的多上下文关注
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380017
Dongxiang Zhang, Yuyang Nie, Sai Wu, Yanyan Shen, K. Tan
Entity matching (EM) is a classic research problem that identifies data instances referring to the same real-world entity. Recent technical trend in this area is to take advantage of deep learning (DL) to automatically extract discriminative features. DeepER and DeepMatcher have emerged as two pioneering DL models for EM. However, these two state-of-the-art solutions simply incorporate vanilla RNNs and straightforward attention mechanisms. In this paper, we fully exploit the semantic context of embedding vectors for the pair of entity text descriptions. In particular, we propose an integrated multi-context attention framework that takes into account self-attention, pair-attention and global-attention from three types of context. The idea is further extended to incorporate attribute attention in order to support structured datasets. We conduct extensive experiments with 7 benchmark datasets that are publicly accessible. The experimental results clearly establish our superiority over DeepER and DeepMatcher in all the datasets.
实体匹配(EM)是识别引用相同现实世界实体的数据实例的经典研究问题。近年来该领域的技术趋势是利用深度学习(DL)来自动提取判别特征。deep和DeepMatcher已经成为EM的两个开创性深度学习模型。然而,这两个最先进的解决方案只是简单地结合了普通的rnn和简单的注意力机制。在本文中,我们充分利用了嵌入向量对实体文本描述的语义上下文。我们特别提出了一个综合的多语境注意框架,该框架考虑了三种类型语境中的自我注意、配对注意和全局注意。为了支持结构化数据集,这个想法被进一步扩展到包含属性关注。我们对7个可公开访问的基准数据集进行了广泛的实验。实验结果清楚地证明了我们在所有数据集上优于deep和DeepMatcher。
{"title":"Multi-Context Attention for Entity Matching","authors":"Dongxiang Zhang, Yuyang Nie, Sai Wu, Yanyan Shen, K. Tan","doi":"10.1145/3366423.3380017","DOIUrl":"https://doi.org/10.1145/3366423.3380017","url":null,"abstract":"Entity matching (EM) is a classic research problem that identifies data instances referring to the same real-world entity. Recent technical trend in this area is to take advantage of deep learning (DL) to automatically extract discriminative features. DeepER and DeepMatcher have emerged as two pioneering DL models for EM. However, these two state-of-the-art solutions simply incorporate vanilla RNNs and straightforward attention mechanisms. In this paper, we fully exploit the semantic context of embedding vectors for the pair of entity text descriptions. In particular, we propose an integrated multi-context attention framework that takes into account self-attention, pair-attention and global-attention from three types of context. The idea is further extended to incorporate attribute attention in order to support structured datasets. We conduct extensive experiments with 7 benchmark datasets that are publicly accessible. The experimental results clearly establish our superiority over DeepER and DeepMatcher in all the datasets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82690086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Few-Sample and Adversarial Representation Learning for Continual Stream Mining 连续流挖掘的少样本和对抗表示学习
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380153
Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, L. Khan
Deep Neural Networks (DNNs) have primarily been demonstrated to be useful for closed-world classification problems where the number of categories is fixed. However, DNNs notoriously fail when tasked with label prediction in a non-stationary data stream scenario, which has the continuous emergence of the unknown or novel class (categories not in the training set). For example, new topics continually emerge in social media or e-commerce. To solve this challenge, a DNN should not only be able to detect the novel class effectively but also incrementally learn new concepts from limited samples over time. Literature that addresses both problems simultaneously is limited. In this paper, we focus on improving the generalization of the model on the novel classes, and making the model continually learn from only a few samples from the novel categories. Different from existing approaches that rely on abundant labeled instances to re-train/update the model, we propose a new approach based on Few Sample and Adversarial Representation Learning (FSAR). The key novelty is that we introduce the adversarial confusion term into both the representation learning and few-sample learning process, which reduces the over-confidence of the model on the seen classes, further enhance the generalization of the model to detect and learn new categories with only a few samples. We train the FSAR operated in two stages: first, FSAR learns an intra-class compacted and inter-class separated feature embedding to detect the novel classes; next, we collect a few labeled samples belong to the new categories, utilize episode-training to exploit the intrinsic features for few-sample learning. We evaluated FSAR on different datasets, using extensive experimental results from various simulated stream benchmarks to show that FSAR effectively outperforms current state-of-the-art approaches.
深度神经网络(dnn)已被证明主要用于封闭世界分类问题,其中类别数量是固定的。然而,当dnn在非平稳数据流场景中进行标签预测时,会出现未知或新类(不在训练集中的类别)的不断出现,这是出了名的失败。例如,社交媒体或电子商务中不断出现新的话题。为了解决这一挑战,深度神经网络不仅要能够有效地检测新类别,还要能够随着时间的推移从有限的样本中逐步学习新概念。同时解决这两个问题的文献是有限的。在本文中,我们的重点是提高模型在新类别上的泛化能力,使模型只从新类别的少数样本中进行持续学习。与现有的依赖大量标记实例来重新训练/更新模型的方法不同,我们提出了一种基于少样本和对抗表示学习(FSAR)的新方法。关键的新颖之处在于,我们在表示学习和少样本学习过程中都引入了对抗混淆项,这减少了模型对已知类别的过度置信度,进一步增强了模型的泛化能力,可以用少量样本来检测和学习新的类别。我们分两个阶段对FSAR进行训练:首先,FSAR学习类内压缩和类间分离的特征嵌入来检测新的类;接下来,我们收集一些属于新类别的标记样本,利用情节训练来挖掘其内在特征进行少样本学习。我们在不同的数据集上评估了FSAR,使用了来自各种模拟流基准的大量实验结果,以表明FSAR有效地优于当前最先进的方法。
{"title":"Few-Sample and Adversarial Representation Learning for Continual Stream Mining","authors":"Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, L. Khan","doi":"10.1145/3366423.3380153","DOIUrl":"https://doi.org/10.1145/3366423.3380153","url":null,"abstract":"Deep Neural Networks (DNNs) have primarily been demonstrated to be useful for closed-world classification problems where the number of categories is fixed. However, DNNs notoriously fail when tasked with label prediction in a non-stationary data stream scenario, which has the continuous emergence of the unknown or novel class (categories not in the training set). For example, new topics continually emerge in social media or e-commerce. To solve this challenge, a DNN should not only be able to detect the novel class effectively but also incrementally learn new concepts from limited samples over time. Literature that addresses both problems simultaneously is limited. In this paper, we focus on improving the generalization of the model on the novel classes, and making the model continually learn from only a few samples from the novel categories. Different from existing approaches that rely on abundant labeled instances to re-train/update the model, we propose a new approach based on Few Sample and Adversarial Representation Learning (FSAR). The key novelty is that we introduce the adversarial confusion term into both the representation learning and few-sample learning process, which reduces the over-confidence of the model on the seen classes, further enhance the generalization of the model to detect and learn new categories with only a few samples. We train the FSAR operated in two stages: first, FSAR learns an intra-class compacted and inter-class separated feature embedding to detect the novel classes; next, we collect a few labeled samples belong to the new categories, utilize episode-training to exploit the intrinsic features for few-sample learning. We evaluated FSAR on different datasets, using extensive experimental results from various simulated stream benchmarks to show that FSAR effectively outperforms current state-of-the-art approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88059010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Multimodal Post Attentive Profiling for Influencer Marketing 影响者营销的多模式后关注分析
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380052
Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han, Wei Wang
Influencer marketing has become a key marketing method for brands in recent years. Hence, brands have been increasingly utilizing influencers’ social networks to reach niche markets, and researchers have been studying various aspects of influencer marketing. However, brands have often suffered from searching and hiring the right influencers with specific interests/topics for their marketing due to a lack of available influencer data and/or limited capacity of marketing agencies. This paper proposes a multimodal deep learning model that uses text and image information from social media posts (i) to classify influencers into specific interests/topics (e.g., fashion, beauty) and (ii) to classify their posts into certain categories. We use the attention mechanism to select the posts that are more relevant to the topics of influencers, thereby generating useful influencer representations. We conduct experiments on the dataset crawled from Instagram, which is the most popular social media for influencer marketing. The experimental results show that our proposed model significantly outperforms existing user profiling methods by achieving 98% and 96% accuracy in classifying influencers and their posts, respectively. We release our influencer dataset of 33,935 influencers labeled with specific topics based on 10,180,500 posts to facilitate future research.
近年来,网红营销已成为品牌营销的重要手段。因此,品牌越来越多地利用网红的社交网络来进入利基市场,研究人员一直在研究网红营销的各个方面。然而,由于缺乏可用的影响者数据和/或营销机构的能力有限,品牌经常在寻找和雇用具有特定兴趣/主题的合适影响者进行营销时遇到麻烦。本文提出了一个多模态深度学习模型,该模型使用来自社交媒体帖子的文本和图像信息(i)将网红分类为特定的兴趣/主题(例如,时尚,美容),以及(ii)将他们的帖子分类为某些类别。我们使用注意力机制来选择与网红主题更相关的帖子,从而生成有用的网红表示。我们对从Instagram抓取的数据集进行了实验,Instagram是最受欢迎的网红营销社交媒体。实验结果表明,我们提出的模型在对影响者及其帖子进行分类方面分别达到98%和96%的准确率,显著优于现有的用户分析方法。我们发布了33,935名影响者的影响者数据集,这些影响者基于10,180,500个帖子标记了特定主题,以促进未来的研究。
{"title":"Multimodal Post Attentive Profiling for Influencer Marketing","authors":"Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han, Wei Wang","doi":"10.1145/3366423.3380052","DOIUrl":"https://doi.org/10.1145/3366423.3380052","url":null,"abstract":"Influencer marketing has become a key marketing method for brands in recent years. Hence, brands have been increasingly utilizing influencers’ social networks to reach niche markets, and researchers have been studying various aspects of influencer marketing. However, brands have often suffered from searching and hiring the right influencers with specific interests/topics for their marketing due to a lack of available influencer data and/or limited capacity of marketing agencies. This paper proposes a multimodal deep learning model that uses text and image information from social media posts (i) to classify influencers into specific interests/topics (e.g., fashion, beauty) and (ii) to classify their posts into certain categories. We use the attention mechanism to select the posts that are more relevant to the topics of influencers, thereby generating useful influencer representations. We conduct experiments on the dataset crawled from Instagram, which is the most popular social media for influencer marketing. The experimental results show that our proposed model significantly outperforms existing user profiling methods by achieving 98% and 96% accuracy in classifying influencers and their posts, respectively. We release our influencer dataset of 33,935 influencers labeled with specific topics based on 10,180,500 posts to facilitate future research.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90042786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs 大规模知识图中不一致解释的快速计算
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380014
T. Tran, Mohamed H. Gad-Elrab, D. Stepanova, E. Kharlamov, Jannik Strotgen
Knowledge graphs (KGs) are essential resources for many applications including Web search and question answering. As KGs are often automatically constructed, they may contain incorrect facts. Detecting them is a crucial, yet extremely expensive task. Prominent solutions detect and explain inconsistency in KGs with respect to accompanying ontologies that describe the KG domain of interest. Compared to machine learning methods they are more reliable and human-interpretable but scale poorly on large KGs. In this paper, we present a novel approach to dramatically speed up the process of detecting and explaining inconsistency in large KGs by exploiting KG abstractions that capture prominent data patterns. Though much smaller, KG abstractions preserve inconsistency and their explanations. Our experiments with large KGs (e.g., DBpedia and Yago) demonstrate the feasibility of our approach and show that it significantly outperforms the popular baseline.
知识图(KGs)是包括Web搜索和问题回答在内的许多应用程序的基本资源。由于kg通常是自动构建的,因此它们可能包含不正确的事实。探测它们是一项至关重要但又极其昂贵的任务。突出的解决方案检测和解释KG中与描述感兴趣的KG域相关的本体的不一致。与机器学习方法相比,它们更可靠,更易于人类解释,但在大型KG上的可扩展性较差。在本文中,我们提出了一种新方法,通过利用捕获突出数据模式的KG抽象来显著加快大型KG中检测和解释不一致的过程。KG抽象虽然小得多,但保留了不一致及其解释。我们对大型kg(例如,DBpedia和Yago)的实验证明了我们的方法的可行性,并表明它明显优于流行的基线。
{"title":"Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs","authors":"T. Tran, Mohamed H. Gad-Elrab, D. Stepanova, E. Kharlamov, Jannik Strotgen","doi":"10.1145/3366423.3380014","DOIUrl":"https://doi.org/10.1145/3366423.3380014","url":null,"abstract":"Knowledge graphs (KGs) are essential resources for many applications including Web search and question answering. As KGs are often automatically constructed, they may contain incorrect facts. Detecting them is a crucial, yet extremely expensive task. Prominent solutions detect and explain inconsistency in KGs with respect to accompanying ontologies that describe the KG domain of interest. Compared to machine learning methods they are more reliable and human-interpretable but scale poorly on large KGs. In this paper, we present a novel approach to dramatically speed up the process of detecting and explaining inconsistency in large KGs by exploiting KG abstractions that capture prominent data patterns. Though much smaller, KG abstractions preserve inconsistency and their explanations. Our experiments with large KGs (e.g., DBpedia and Yago) demonstrate the feasibility of our approach and show that it significantly outperforms the popular baseline.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87684406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Data-Driven Metric of Incentive Compatibility 激励兼容性的数据驱动度量
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380249
Yuan Deng, Sébastien Lahaie, V. Mirrokni, Song Zuo
An incentive-compatible auction incentivizes buyers to truthfully reveal their private valuations. However, many ad auction mechanisms deployed in practice are not incentive-compatible, such as first-price auctions (for display advertising) and the generalized second-price auction (for search advertising). We introduce a new metric to quantify incentive compatibility in both static and dynamic environments. Our metric is data-driven and can be computed directly through black-box auction simulations without relying on reference mechanisms or complex optimizations. We provide interpretable characterizations of our metric and prove that it is monotone in auction parameters for several mechanisms used in practice, such as soft floors and dynamic reserve prices. We empirically evaluate our metric on ad auction data from a major ad exchange and a major search engine to demonstrate its broad applicability in practice.
激励相容的拍卖激励买家如实披露他们的私人估值。然而,在实践中部署的许多广告拍卖机制与激励机制并不兼容,例如第一价格拍卖(用于展示广告)和广义第二价格拍卖(用于搜索广告)。我们引入了一种新的度量来量化静态和动态环境下的激励兼容性。我们的指标是数据驱动的,可以通过黑盒拍卖模拟直接计算,而不依赖于参考机制或复杂的优化。我们提供了我们的度量的可解释特征,并证明了在实践中使用的几种机制(如软底和动态保留价格)的拍卖参数中它是单调的。我们对来自一家主要广告交易所和一家主要搜索引擎的广告拍卖数据进行了实证评估,以证明其在实践中的广泛适用性。
{"title":"A Data-Driven Metric of Incentive Compatibility","authors":"Yuan Deng, Sébastien Lahaie, V. Mirrokni, Song Zuo","doi":"10.1145/3366423.3380249","DOIUrl":"https://doi.org/10.1145/3366423.3380249","url":null,"abstract":"An incentive-compatible auction incentivizes buyers to truthfully reveal their private valuations. However, many ad auction mechanisms deployed in practice are not incentive-compatible, such as first-price auctions (for display advertising) and the generalized second-price auction (for search advertising). We introduce a new metric to quantify incentive compatibility in both static and dynamic environments. Our metric is data-driven and can be computed directly through black-box auction simulations without relying on reference mechanisms or complex optimizations. We provide interpretable characterizations of our metric and prove that it is monotone in auction parameters for several mechanisms used in practice, such as soft floors and dynamic reserve prices. We empirically evaluate our metric on ad auction data from a major ad exchange and a major search engine to demonstrate its broad applicability in practice.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76932737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
RLPer: A Reinforcement Learning Model for Personalized Search RLPer:个性化搜索的强化学习模型
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380294
Jing Yao, Zhicheng Dou, Jun Xu, Ji-rong Wen
Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.
个性化搜索通过考虑用户兴趣并向单个用户返回更准确的搜索结果来改进通用排名模型。近年来,机器学习和深度学习技术已成功应用于个性化搜索。大多数现有的个性化模型只是将搜索历史视为静态的用户行为集合,并根据记录的数据学习固定的排名策略。虽然已经观察到改进,但很明显,这些方法忽略了搜索过程的动态特性:搜索是搜索引擎和用户之间的一系列交互。在搜索过程中,用户的兴趣可能会发生动态变化。如果个性化搜索模型能够跟踪整个交互过程并不断更新其排名策略,将会更有帮助。在本文中,我们提出了一种基于强化学习的个性化模型(RLPer),该模型使用分层马尔可夫决策过程(MDP)来跟踪用户与搜索引擎之间的顺序交互。在RLPer中,搜索引擎与用户交互,通过实时反馈不断更新底层排名模型。我们设计了一个反馈感知的个性化排名组件来捕捉用户的反馈,这些反馈会影响用户对下一个查询的兴趣。在公开可用的AOL搜索日志上的实验结果证实,我们提出的模型可以显著优于最先进的个性化搜索模型。
{"title":"RLPer: A Reinforcement Learning Model for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Jun Xu, Ji-rong Wen","doi":"10.1145/3366423.3380294","DOIUrl":"https://doi.org/10.1145/3366423.3380294","url":null,"abstract":"Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89915837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures 用于保护动态消费者物联网基础设施的基于意图的自动化框架
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380234
Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das
Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.
消费者物联网网络的特点是具有不同功能和编程接口的异构设备。这种同质性的缺乏使得物联网基础设施的集成和安全管理成为用户和管理员的一项艰巨任务。在本文中,我们介绍了VISCR,一个独立于供应商的策略规范和冲突解决引擎,可以在物联网环境中实现基于意图的无冲突策略规范和执行。VISCR将物联网基础设施的拓扑转换为基于树的抽象,并将现有策略从异构的特定于供应商的编程语言(如基于groovy的SmartThings、OpenHAB、基于iftt的模板和基于mudd的配置文件)转换为独立于供应商的基于图的规范。然后使用它们自动检测流氓策略、策略冲突和自动化错误。我们使用907个物联网应用程序的数据集来评估VISCR,这些应用程序使用异构自动化规范编程,在模拟的智能建筑物联网基础设施中。在我们的实验中,在907个物联网应用程序中,VISCR暴露了342个物联网应用程序存在一个或多个违规行为,同时运行速度比最先进的工具(Soteria)快14.2倍。VISCR检测到Soteria报告的100%违规行为,同时还在266个额外的应用程序中检测到新的违规类型。
{"title":"An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures","authors":"Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das","doi":"10.1145/3366423.3380234","DOIUrl":"https://doi.org/10.1145/3366423.3380234","url":null,"abstract":"Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84596450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of The Web Conference 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1