Proceedings of The Web Conference 2020最新文献_第8页

A Data-Driven Metric of Incentive Compatibility 激励兼容性的数据驱动度量

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380249

Yuan Deng, Sébastien Lahaie, V. Mirrokni, Song Zuo

An incentive-compatible auction incentivizes buyers to truthfully reveal their private valuations. However, many ad auction mechanisms deployed in practice are not incentive-compatible, such as first-price auctions (for display advertising) and the generalized second-price auction (for search advertising). We introduce a new metric to quantify incentive compatibility in both static and dynamic environments. Our metric is data-driven and can be computed directly through black-box auction simulations without relying on reference mechanisms or complex optimizations. We provide interpretable characterizations of our metric and prove that it is monotone in auction parameters for several mechanisms used in practice, such as soft floors and dynamic reserve prices. We empirically evaluate our metric on ad auction data from a major ad exchange and a major search engine to demonstrate its broad applicability in practice.

激励相容的拍卖激励买家如实披露他们的私人估值。然而，在实践中部署的许多广告拍卖机制与激励机制并不兼容，例如第一价格拍卖(用于展示广告)和广义第二价格拍卖(用于搜索广告)。我们引入了一种新的度量来量化静态和动态环境下的激励兼容性。我们的指标是数据驱动的，可以通过黑盒拍卖模拟直接计算，而不依赖于参考机制或复杂的优化。我们提供了我们的度量的可解释特征，并证明了在实践中使用的几种机制(如软底和动态保留价格)的拍卖参数中它是单调的。我们对来自一家主要广告交易所和一家主要搜索引擎的广告拍卖数据进行了实证评估，以证明其在实践中的广泛适用性。

引用次数: 11

P-Simrank: Extending Simrank to Scale-Free Bipartite Networks p - simmrank:将simmrank扩展到无标度二部网络

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380081

Prasenjit Dey, Kunal Goel, Rahul Agrawal

The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.

图中节点之间的相似性度量在计算机科学的许多领域是一个有用的工具。由Jeh和Widom[7]提出的simmrank是一种经典的图中节点相似度度量，具有理论性和直观性，在查询重写、链接预测、协同过滤等许多应用中得到了广泛的研究和应用。现有的基于Simrank的工作主要集中在保留微观结构，如顶点的二阶和三阶接近性，而忽略了宏观的无标度性。无标度特性是任何现实世界中顶点度遵循重尾分布的网络图的一个关键特性。本文引入了p - simmrank，将simmrank的思想推广到无标度二部网络中。为了研究提出的解决方案在现实世界问题上的有效性，我们使用类似于simrank++[1]的二部点击图(bipartite click graph)对赞助搜索领域中众所周知的查询重写问题进行了相同的测试，该图作为我们的基线。我们证明simrank++在顶点的度分布遵循幂律的二部图的情况下产生次优的相似性分数。我们还展示了如何针对现实世界中的大型图形优化p - simmrank。最后，我们通过实验评估了p - simmrank算法与simmrank ++的对比，使用从必应获得的实际点击图，并表明p - simmrank在各种指标上优于simmrank ++。

{"title":"P-Simrank: Extending Simrank to Scale-Free Bipartite Networks","authors":"Prasenjit Dey, Kunal Goel, Rahul Agrawal","doi":"10.1145/3366423.3380081","DOIUrl":"https://doi.org/10.1145/3366423.3380081","url":null,"abstract":"The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77021093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures 用于保护动态消费者物联网基础设施的基于意图的自动化框架

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380234

Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das

Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.

消费者物联网网络的特点是具有不同功能和编程接口的异构设备。这种同质性的缺乏使得物联网基础设施的集成和安全管理成为用户和管理员的一项艰巨任务。在本文中，我们介绍了VISCR，一个独立于供应商的策略规范和冲突解决引擎，可以在物联网环境中实现基于意图的无冲突策略规范和执行。VISCR将物联网基础设施的拓扑转换为基于树的抽象，并将现有策略从异构的特定于供应商的编程语言(如基于groovy的SmartThings、OpenHAB、基于iftt的模板和基于mudd的配置文件)转换为独立于供应商的基于图的规范。然后使用它们自动检测流氓策略、策略冲突和自动化错误。我们使用907个物联网应用程序的数据集来评估VISCR，这些应用程序使用异构自动化规范编程，在模拟的智能建筑物联网基础设施中。在我们的实验中，在907个物联网应用程序中，VISCR暴露了342个物联网应用程序存在一个或多个违规行为，同时运行速度比最先进的工具(Soteria)快14.2倍。VISCR检测到Soteria报告的100%违规行为，同时还在266个额外的应用程序中检测到新的违规类型。

{"title":"An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures","authors":"Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das","doi":"10.1145/3366423.3380234","DOIUrl":"https://doi.org/10.1145/3366423.3380234","url":null,"abstract":"Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84596450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

RLPer: A Reinforcement Learning Model for Personalized Search RLPer:个性化搜索的强化学习模型

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380294

Jing Yao, Zhicheng Dou, Jun Xu, Ji-rong Wen

Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.

个性化搜索通过考虑用户兴趣并向单个用户返回更准确的搜索结果来改进通用排名模型。近年来，机器学习和深度学习技术已成功应用于个性化搜索。大多数现有的个性化模型只是将搜索历史视为静态的用户行为集合，并根据记录的数据学习固定的排名策略。虽然已经观察到改进，但很明显，这些方法忽略了搜索过程的动态特性:搜索是搜索引擎和用户之间的一系列交互。在搜索过程中，用户的兴趣可能会发生动态变化。如果个性化搜索模型能够跟踪整个交互过程并不断更新其排名策略，将会更有帮助。在本文中，我们提出了一种基于强化学习的个性化模型(RLPer)，该模型使用分层马尔可夫决策过程(MDP)来跟踪用户与搜索引擎之间的顺序交互。在RLPer中，搜索引擎与用户交互，通过实时反馈不断更新底层排名模型。我们设计了一个反馈感知的个性化排名组件来捕捉用户的反馈，这些反馈会影响用户对下一个查询的兴趣。在公开可用的AOL搜索日志上的实验结果证实，我们提出的模型可以显著优于最先进的个性化搜索模型。

{"title":"RLPer: A Reinforcement Learning Model for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Jun Xu, Ji-rong Wen","doi":"10.1145/3366423.3380294","DOIUrl":"https://doi.org/10.1145/3366423.3380294","url":null,"abstract":"Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89915837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Privacy-preserving AI Services Through Data Decentralization 通过数据去中心化保护隐私的人工智能服务

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380106

Christian Meurisch, Bekir Bayrak, M. Mühlhäuser

User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.

用户服务越来越多地基于人工智能模型，例如提供个性化和主动支持。然而，底层的人工智能算法需要连续的个人数据流——这导致了隐私问题，因为用户通常必须在自己的领域之外共享这些数据。目前的隐私保护概念要么不适用于这种基于人工智能的服务，要么对任何一方都不利。本文介绍了PrivAI，这是一个新的分散和隐私设计平台，用于克服共享用户数据的需求，从而从个性化人工智能服务中受益。简而言之，PrivAI补充了现有的个人数据存储方法，但严格执行对原始用户数据的限制。PrivAI进一步解决了由此带来的挑战:(1)将人工智能算法分为基于云的通用模型训练、随后的本地个性化和基于社区的新用户模型更新共享;通过(2)将机密AI模型加载到可信的执行环境中，从而保护提供商的知识产权(IP)。我们的实验证明了PrivAI的可行性和有效性，其性能与目前实践的方法相当。

{"title":"Privacy-preserving AI Services Through Data Decentralization","authors":"Christian Meurisch, Bekir Bayrak, M. Mühlhäuser","doi":"10.1145/3366423.3380106","DOIUrl":"https://doi.org/10.1145/3366423.3380106","url":null,"abstract":"User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74247680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data 稀疏检入数据上连续POI推荐的类别感知深度模型

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380202

Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu

As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.

随着大量的POI签入数据的积累，连续的兴趣点(POI)建议越来越受欢迎。现有的连续POI推荐方法只预测用户下一步将去哪里，而忽略了这种行为何时发生。在这项工作中，我们专注于预测用户在未来24小时内将访问的poi。由于签入数据非常稀疏，因此在时间模式中准确捕获用户偏好是一项挑战。为此，我们提出了一种包含POI类别和地理影响的类别感知深度模型CatDM，以减少搜索空间，克服数据稀疏性。我们设计了两个基于LSTM的深度编码器来对时间序列数据建模。第一个编码器捕获POI类别中的用户首选项，而第二个编码器利用POI中的用户首选项。考虑到第二个编码器的时钟影响，我们将每个用户的签到历史划分为几个不同的时间窗口，并为每个窗口开发个性化的注意力机制，以促进CatDM利用时间模式。此外，为了对候选集进行排序，我们考虑了四种特定的依赖关系:用户- poi、用户类别、poi时间和poi用户当前偏好。在两个大型真实数据集上进行了大量的实验。实验结果表明，我们的CatDM在稀疏签入数据上的连续POI推荐方面优于最先进的模型。

{"title":"A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data","authors":"Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu","doi":"10.1145/3366423.3380202","DOIUrl":"https://doi.org/10.1145/3366423.3380202","url":null,"abstract":"As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 82

Democratizing Content Creation and Dissemination through AI Technology 通过人工智能技术实现内容创作和传播的民主化

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3382669

Wei-Ying Ma

With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.

随着移动视频、用户生成内容和社交网络的兴起，媒体和内容行业存在着巨大的颠覆性创新机会。随着人工智能内容创作、传播和互动技术的快速发展，这是一个快速变化的领域。我相信，当前的趋势正在引领我们走向这样一个世界:每个人都有同等的权力制作高质量的视频、音乐、增强现实等内容，并与全球广大受众分享他们的信息、知识和故事。这个新的人工智能内容平台可以进一步引领广告、电子商务、在线教育和生产力方面的创新。我将分享字节跳动目前通过抖音和抖音等产品与这个新兴平台相连的研究工作，并讨论我们未来研究的挑战和方向。

引用次数: 1

Efficient Online Multi-Task Learning via Adaptive Kernel Selection 基于自适应核选择的高效在线多任务学习

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3379993

Peng Yang, P. Li

Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.

传统的多任务模型将任务结构限制为线性相关，这可能不适用于数据线性不可分的情况。为了解决这个问题，我们提出了一个在线多任务分类的核算法，因为通过复制核希尔伯特空间提供的大近似空间通常包含一个精确的函数。具体来说，它在每个任务模型上维护一个局部全局高斯分布，指导参数更新的方向和规模。尽管如此，在这个空间上进行优化在计算上是昂贵的。此外，大多数多任务学习方法需要访问整个训练实例，这在大规模流学习场景中是不可用的。为了克服这个问题，我们提出了一种跨多个任务的随机核采样技术。该算法不需要所有输入的标签，而是通过考虑相关任务对标签预测的置信度来决定是否查询标签。从理论上讲，在主动采样标签上训练的算法可以获得与在所有标签上学习的算法相当的结果。经验表明，该算法在降低计算复杂度和标注成本的同时，取得了良好的学习效果。

{"title":"Efficient Online Multi-Task Learning via Adaptive Kernel Selection","authors":"Peng Yang, P. Li","doi":"10.1145/3366423.3379993","DOIUrl":"https://doi.org/10.1145/3366423.3379993","url":null,"abstract":"Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89504862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder 使用对抗性自动编码器的高效隐式无监督文本哈希

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380150

Khoa D Doan, C. Reddy

Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.

搜索具有语义相似内容的文档是信息检索领域的一个基本问题，面临着各种挑战，主要是效率和有效性方面的挑战。尽管有希望对文档中的结构化依赖关系进行建模，但现有的几种文本散列方法缺乏有效的机制来合并此类重要信息。此外，理想哈希函数的理想特性，如对噪声的鲁棒性、低量化误差和比特平衡/不相关，不能通过现有方法有效地学习。这是因为需要调整额外的超参数或优化这些启发式和显式构造的成本函数。在本文中，我们提出了一种去噪对抗性二进制自编码器(DABA)模型，该模型提出了一种新的表示学习框架，可以在学习的哈希函数中捕获文本文档的结构化表示。此外，对抗性训练提供了一种替代方向，可以隐式学习捕获理想哈希函数的所有所需特征的哈希函数。从本质上讲，DABA采用了一种新颖的单优化对抗训练过程，该过程在其原始域内最小化Wasserstein距离，以正则化循环神经网络或卷积自编码器的编码器输出。我们通过实证证明了我们所提出的方法在捕获相关文档的内在语义歧义方面的有效性。本文提出的方法优于当前最先进的浅层和深层无监督散列方法，用于几个突出的文档集合的文档检索任务。

{"title":"Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder","authors":"Khoa D Doan, C. Reddy","doi":"10.1145/3366423.3380150","DOIUrl":"https://doi.org/10.1145/3366423.3380150","url":null,"abstract":"Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81116063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding 面向任务的大规模复杂异构图嵌入遗传激活

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380230

Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu

The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.

近年来深度图嵌入的成功创新了图形信息表征方法。然而，在实际应用程序中，这种方法仍然面临着异构性、可伸缩性和多路复用的挑战。为了解决这些挑战，在本研究中，我们提出了一种新的解决方案，遗传异构图嵌入(GERM)，它可以在复杂的异构图中实现灵活高效的任务驱动顶点嵌入。与之前的研究不同，我们采用了一种面向任务的遗传激活策略来有效地在图中的边缘类型上生成“边缘类型激活向量”(ETAV)。生成的ETAV不仅可以减少不兼容噪声并在图-模式级别上导航异构图随机游走，而且还可以激活优化的子图以进行高效的表示学习。通过揭示图结构与任务信息之间的相关性，可以增强模型的可解释性。同时，提出了一个激活的异构跳图框架来封装给定异构图的拓扑信息和任务特定信息。通过在学术和电子商务数据集上的广泛实验，我们通过各种搜索/推荐任务证明了所提出方法的有效性和可扩展性。通过与基线进行比较，GERM可以显著减少运行时间并消除专家干预，而不会牺牲性能(甚至略微提高性能)。

{"title":"Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding","authors":"Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu","doi":"10.1145/3366423.3380230","DOIUrl":"https://doi.org/10.1145/3366423.3380230","url":null,"abstract":"The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90469714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13