Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献_第4页

Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching 大规模竞价关键字匹配的异构图神经网络

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481926

Zongtao Liu, Bin Ma, Quanlian Liu, Jian Xu, Bo Zheng

Digital advertising is a critical part of many e-commerce platforms such as Taobao and Amazon. While in recent years a lot of attention has been drawn to the consumer side including canonical problems like ctr/cvr prediction, the advertiser side, which directly serves advertisers by providing them with marketing tools, is now playing a more and more important role. When speaking of sponsored search, bid keyword recommendation is the fundamental service. This paper addresses the problem of keyword matching, the primary step of keyword recommendation. Existing methods for keyword matching merely consider modeling relevance based on a single type of relation among ads and keywords, such as query clicks or text similarity, which neglects rich heterogeneous interactions hidden behind them. To fill this gap, the keyword matching problem faces several challenges including: 1) how to learn enriched and robust embeddings from complex interactions among various types of objects; 2) how to conduct high-quality matching for new ads that usually lack sufficient data. To address these challenges, we develop a heterogeneous-graph-neural-network-based model for keyword matching named HetMatch, which has been deployed both online and offline at the core sponsored search platform of Alibaba Group. To extract enriched and robust embeddings among rich relations, we design a hierarchical structure to fuse and enhance the relevant neighborhood patterns both on the micro and the macro level. Moreover, by proposing a multi-view framework, the model is able to involve more positive samples for cold-start ads. Experimental results on a large-scale industrial dataset as well as online AB tests exhibit the effectiveness of HetMatch.

数字广告是淘宝和亚马逊等许多电子商务平台的重要组成部分。近年来，消费者端(包括ctr/cvr预测等典型问题)受到了很多关注，而广告商端(通过向广告商提供营销工具直接为其服务)正发挥着越来越重要的作用。说到赞助搜索，竞价关键词推荐是最基本的服务。本文解决了关键词匹配问题，这是关键词推荐的第一步。现有的关键词匹配方法仅仅考虑基于单一类型的广告和关键词之间的关联建模，如查询点击或文本相似度，而忽略了隐藏在它们背后的丰富的异构交互。为了填补这一空白，关键字匹配问题面临着几个挑战，包括:1)如何从各种类型对象之间的复杂交互中学习丰富和鲁棒的嵌入;2)如何对通常缺乏足够数据的新广告进行高质量匹配。为了解决这些挑战，我们开发了一个基于异构图神经网络的关键字匹配模型HetMatch，该模型已在阿里巴巴集团的核心赞助搜索平台线上和线下部署。为了在丰富关系中提取丰富的鲁棒嵌入，我们设计了一个层次结构，从微观和宏观两个层面融合和增强相关的邻域模式。此外，通过提出一个多视图框架，该模型能够涉及更多的冷启动广告的阳性样本。在大规模工业数据集和在线AB测试上的实验结果显示了HetMatch的有效性。

{"title":"Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching","authors":"Zongtao Liu, Bin Ma, Quanlian Liu, Jian Xu, Bo Zheng","doi":"10.1145/3459637.3481926","DOIUrl":"https://doi.org/10.1145/3459637.3481926","url":null,"abstract":"Digital advertising is a critical part of many e-commerce platforms such as Taobao and Amazon. While in recent years a lot of attention has been drawn to the consumer side including canonical problems like ctr/cvr prediction, the advertiser side, which directly serves advertisers by providing them with marketing tools, is now playing a more and more important role. When speaking of sponsored search, bid keyword recommendation is the fundamental service. This paper addresses the problem of keyword matching, the primary step of keyword recommendation. Existing methods for keyword matching merely consider modeling relevance based on a single type of relation among ads and keywords, such as query clicks or text similarity, which neglects rich heterogeneous interactions hidden behind them. To fill this gap, the keyword matching problem faces several challenges including: 1) how to learn enriched and robust embeddings from complex interactions among various types of objects; 2) how to conduct high-quality matching for new ads that usually lack sufficient data. To address these challenges, we develop a heterogeneous-graph-neural-network-based model for keyword matching named HetMatch, which has been deployed both online and offline at the core sponsored search platform of Alibaba Group. To extract enriched and robust embeddings among rich relations, we design a hierarchical structure to fuse and enhance the relevant neighborhood patterns both on the micro and the macro level. Moreover, by proposing a multi-view framework, the model is able to involve more positive samples for cold-start ads. Experimental results on a large-scale industrial dataset as well as online AB tests exhibit the effectiveness of HetMatch.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"17 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124144513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Agenda: Robust Personalized PageRanks in Evolving Graphs 议程:进化图中的鲁棒个性化网页排名

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482317

Dingheng Mo, Siqiang Luo

Given a source node s and a target node t in a graph G, the Personalized PageRank (PPR) from s to t is the probability of a random walk starting from s terminates at t. PPR is a classic measure of the relevance among different nodes in a graph, and has been applied in numerous real-world systems. However, existing techniques for PPR queries are not robust to dynamic real-world graphs, which typically have different evolving speeds. Their performance is significantly degraded either at a lower graph evolving rate (e.g., much more queries than updates) or a higher rate. To address the above deficiencies, we propose Agenda to efficiently process, with strong approximation guarantees, the single-source PPR (SSPPR) queries on dynamically evolving graphs with various evolving speeds. Compared with previous methods, Agenda has significantly better workload robustness, while ensuring the same result accuracy. Agenda also has theoretically-guaranteed small query and update costs. Experiments on up to billion-edge scale graphs show that Agenda significantly outperforms state-of-the-art methods for various query/update workloads, while maintaining better or comparable approximation accuracies.

给定图G中的源节点s和目标节点t，从s到t的个性化PageRank (PPR)是从s开始的随机游走在t处终止的概率。PPR是图中不同节点之间相关性的经典度量，已应用于许多实际系统中。然而，现有的PPR查询技术对于动态现实世界的图形并不健壮，这些图形通常具有不同的发展速度。它们的性能在较低的图演化率(例如，查询比更新多得多)或较高的图演化率下都会显著下降。为了解决上述不足，我们提出了一个Agenda，在强近似保证下，对不同演化速度的动态演化图进行单源PPR (SSPPR)查询的高效处理。与以往的方法相比，Agenda在保证结果准确性的前提下，显著提高了工作负载鲁棒性。从理论上讲，Agenda也保证了较小的查询和更新成本。在高达十亿边缘规模的图上进行的实验表明，对于各种查询/更新工作负载，Agenda的性能明显优于最先进的方法，同时保持更好或相当的近似精度。

{"title":"Agenda: Robust Personalized PageRanks in Evolving Graphs","authors":"Dingheng Mo, Siqiang Luo","doi":"10.1145/3459637.3482317","DOIUrl":"https://doi.org/10.1145/3459637.3482317","url":null,"abstract":"Given a source node s and a target node t in a graph G, the Personalized PageRank (PPR) from s to t is the probability of a random walk starting from s terminates at t. PPR is a classic measure of the relevance among different nodes in a graph, and has been applied in numerous real-world systems. However, existing techniques for PPR queries are not robust to dynamic real-world graphs, which typically have different evolving speeds. Their performance is significantly degraded either at a lower graph evolving rate (e.g., much more queries than updates) or a higher rate. To address the above deficiencies, we propose Agenda to efficiently process, with strong approximation guarantees, the single-source PPR (SSPPR) queries on dynamically evolving graphs with various evolving speeds. Compared with previous methods, Agenda has significantly better workload robustness, while ensuring the same result accuracy. Agenda also has theoretically-guaranteed small query and update costs. Experiments on up to billion-edge scale graphs show that Agenda significantly outperforms state-of-the-art methods for various query/update workloads, while maintaining better or comparable approximation accuracies.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125566495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Jasmine: Exploring the Dependency-Aware Execution on Distributed Shared Memory Jasmine:探索分布式共享内存上的依赖感知执行

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481993

Xing Wei, Huiqi Hu, Xuan Zhou, Xuecheng Qi, Weining Qian, Jiang Wang, Aoying Zhou

Distributed shared memory abstraction can coordinate a cluster of machine nodes to empower performance-critical queries with the scalable memory space and abundant parallelism. But to deploy the query under such an abstraction, the general execution model just makes operators expressed as multiple subtasks and sequentially schedule them in parallel, while neglecting those vital dependencies between subtasks and data. In this paper, we conduct the in-depth researches about the issues (i.e., low CPU Utilization and poor data locality) raised by the ignorance of dependencies, and then propose a dependency-aware query execution model called Jasmine, which can (i) help users explicitly declare the dependencies and (ii) take these declared dependencies into the consideration of execution to address the issues. We invite our audience to use the rich graphical interfaces to interact with Jasmine to explore the dependency-aware query execution on distributed shared memory.

分布式共享内存抽象可以协调机器节点集群，从而通过可扩展的内存空间和丰富的并行性来支持性能关键型查询。但是，为了在这种抽象下部署查询，一般的执行模型只是将操作符表示为多个子任务，并按顺序并行调度它们，而忽略了子任务和数据之间的重要依赖关系。本文深入研究了由于忽略依赖关系而导致的CPU利用率低、数据局地性差等问题，并提出了一种依赖感知的查询执行模型Jasmine，该模型可以(i)帮助用户显式地声明依赖关系，(ii)在执行时考虑这些声明的依赖关系来解决问题。我们邀请读者使用丰富的图形界面与Jasmine交互，探索分布式共享内存上的依赖关系感知查询执行。

引用次数: 0

Metric Sentiment Learning for Label Representation 标签表示的度量情感学习

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482369

Chengyu Song, Fei Cai, Jianming Zheng, Wanyu Chen, Zhiqiang Pan

Label representation aims to generate a so-called verbalizer to an input text, which has a broad application in the field of text classification, event detection, question answering, etc. Previous works on label representation, especially in a few-shot setting, mainly define the verbalizers manually, which is accurate but time-consuming. Other models fail to correctly produce antonymous verbalizers for two semantically opposite classes. Thus, in this paper, we propose a metric sentiment learning framework (MSeLF) to generate the verbalizers automatically, which can capture the sentiment differences between the verbalizers accurately. In detail, MSeLF consists of two major components, i.e., the contrastive mapping learning (CML) module and the equal-gradient verbalizer acquisition (EVA) module. CML learns a transformation matrix to project the initial word embeddings to the antonym-aware embeddings by enlarging the distance between the antonyms. After that, in the antonym-aware embedding space, EVA first takes a pair of antonymous words as verbalizers for two opposite classes and then applies a sentiment transition vector to generate verbalizers for intermediate classes. We use the generated verbalizers for the downstream text classification task in a few-shot setting on two publicly available fine-grained datasets. The results indicate that our proposal outperforms the state-of-the-art baselines in terms of accuracy. In addition, we find CML can be used as a flexible plug-in component in other verbalizer acquisition approaches.

标签表示的目的是对输入文本生成一个所谓的语言表达器，在文本分类、事件检测、问题回答等领域有着广泛的应用。以往的标签表示工作，特别是在少量镜头设置中，主要是手动定义语言表达器，这是准确的，但耗时。其他模型不能正确地为两个语义相反的类生成匿名的语言表达器。因此，本文提出了一个度量情感学习框架(MSeLF)来自动生成语言表达者，该框架可以准确地捕捉语言表达者之间的情感差异。具体来说，MSeLF由两个主要部分组成，即对比映射学习(CML)模块和等梯度语言习得(EVA)模块。CML学习一个变换矩阵，通过扩大反义词之间的距离，将初始词嵌入投影到反义词感知嵌入中。然后，在同义感知的嵌入空间中，EVA首先取一对同义词作为两个对立类的表达词，然后应用情感转移向量生成中间类的表达词。我们在两个公开可用的细粒度数据集上使用生成的语言器进行下游文本分类任务。结果表明，我们的建议在准确性方面优于最先进的基线。此外，我们发现CML可以作为灵活的插件组件用于其他语言获取方法。

{"title":"Metric Sentiment Learning for Label Representation","authors":"Chengyu Song, Fei Cai, Jianming Zheng, Wanyu Chen, Zhiqiang Pan","doi":"10.1145/3459637.3482369","DOIUrl":"https://doi.org/10.1145/3459637.3482369","url":null,"abstract":"Label representation aims to generate a so-called verbalizer to an input text, which has a broad application in the field of text classification, event detection, question answering, etc. Previous works on label representation, especially in a few-shot setting, mainly define the verbalizers manually, which is accurate but time-consuming. Other models fail to correctly produce antonymous verbalizers for two semantically opposite classes. Thus, in this paper, we propose a metric sentiment learning framework (MSeLF) to generate the verbalizers automatically, which can capture the sentiment differences between the verbalizers accurately. In detail, MSeLF consists of two major components, i.e., the contrastive mapping learning (CML) module and the equal-gradient verbalizer acquisition (EVA) module. CML learns a transformation matrix to project the initial word embeddings to the antonym-aware embeddings by enlarging the distance between the antonyms. After that, in the antonym-aware embedding space, EVA first takes a pair of antonymous words as verbalizers for two opposite classes and then applies a sentiment transition vector to generate verbalizers for intermediate classes. We use the generated verbalizers for the downstream text classification task in a few-shot setting on two publicly available fine-grained datasets. The results indicate that our proposal outperforms the state-of-the-art baselines in terms of accuracy. In addition, we find CML can be used as a flexible plug-in component in other verbalizer acquisition approaches.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115764549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prohibited Item Detection on Heterogeneous Risk Graphs 异构风险图上的违禁物品检测

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481945

Yugang Ji, C. Shi, Xiao Wang

Prohibited item detection, which aims to detect illegal items hidden on e-commerce platforms, plays a significant role in evading risks and preventing crimes for online shopping. While traditional solutions usually focus on mining evidence from independent items, they cannot effectively utilize the rich structural relevance among different items. A naive idea is to directly deploy existing supervised graph neural networks to learn node representations for item classification. However, the very few manually labeled items with various risk patterns introduce two essential challenges: (1) How to enhance the representations of enormous unlabeled items? (2) How to enrich the supervised information in this few-labeled but multiple-pattern business scenario? In this paper, we construct item logs as a Heterogeneous Risk Graph (HRG), and propose the novel Heterogeneous Self-supervised Prohibited item Detection model (HSPD) to overcome these challenges. HSPD first designs the heterogeneous self-supervised learning model, which treats multiple semantics as the supervision to enhance item representations. Then, it presents the directed pairwise labeling to learn the distance from candidates to their most relevant prohibited seeds, which tackles the binary-labeled multi-patterned risks. Finally, HSPD integrates with self-training mechanisms to iteratively expand confident pseudo labels for enriching supervision. The extensive offline and online experimental results on three real-world HRGs demonstrate that HSPD consistently outperforms the state-of-the-art alternatives.

违禁物品检测，旨在发现隐藏在电子商务平台上的非法物品，在规避风险和预防网上购物犯罪方面发挥着重要作用。传统的解决方案通常侧重于从独立的项目中挖掘证据，不能有效地利用不同项目之间丰富的结构相关性。一个朴素的想法是直接部署现有的监督图神经网络来学习节点表示用于项目分类。然而，具有各种风险模式的极少数人工标记项目引入了两个基本挑战:(1)如何增强大量未标记项目的表示?(2)在这个标签少但模式多的业务场景中，如何丰富监督信息?本文将项目日志构建为异构风险图(HRG)，并提出了新的异构自监督违禁项目检测模型(HSPD)来克服这些挑战。HSPD首先设计了异构自监督学习模型，该模型将多个语义作为监督来增强项目表征。然后，提出了有向两两标记，学习候选对象到最相关违禁种子的距离，解决了二元标记的多模式风险;最后，HSPD与自我训练机制相结合，迭代扩展自信伪标签，丰富监管内容。在三个真实的hrg上进行的大量离线和在线实验结果表明，HSPD始终优于最先进的替代方案。

{"title":"Prohibited Item Detection on Heterogeneous Risk Graphs","authors":"Yugang Ji, C. Shi, Xiao Wang","doi":"10.1145/3459637.3481945","DOIUrl":"https://doi.org/10.1145/3459637.3481945","url":null,"abstract":"Prohibited item detection, which aims to detect illegal items hidden on e-commerce platforms, plays a significant role in evading risks and preventing crimes for online shopping. While traditional solutions usually focus on mining evidence from independent items, they cannot effectively utilize the rich structural relevance among different items. A naive idea is to directly deploy existing supervised graph neural networks to learn node representations for item classification. However, the very few manually labeled items with various risk patterns introduce two essential challenges: (1) How to enhance the representations of enormous unlabeled items? (2) How to enrich the supervised information in this few-labeled but multiple-pattern business scenario? In this paper, we construct item logs as a Heterogeneous Risk Graph (HRG), and propose the novel Heterogeneous Self-supervised Prohibited item Detection model (HSPD) to overcome these challenges. HSPD first designs the heterogeneous self-supervised learning model, which treats multiple semantics as the supervision to enhance item representations. Then, it presents the directed pairwise labeling to learn the distance from candidates to their most relevant prohibited seeds, which tackles the binary-labeled multi-patterned risks. Finally, HSPD integrates with self-training mechanisms to iteratively expand confident pseudo labels for enriching supervision. The extensive offline and online experimental results on three real-world HRGs demonstrate that HSPD consistently outperforms the state-of-the-art alternatives.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131349284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Smoothing with Fake Label 假标签平滑

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482184

Ziyang Luo, Yadong Xi, Xiaoxi Mao

Label Smoothing is a widely used technique in many areas. It can prevent the network from being over-confident. However, it hypotheses that the prior distribution of all classes is uniform. Here, we decide to abandon this hypothesis and propose a new smoothing method, called Smoothing with Fake Label. It shares a part of the prediction probability to a new fake class. Our experiment results show that the method can increase the performance of the models on most tasks and outperform the Label Smoothing on text classification and cross-lingual transfer tasks.

标签平滑是一种广泛应用于许多领域的技术。它可以防止网络过于自信。然而，它假设所有类别的先验分布是均匀的。在这里，我们决定放弃这个假设，提出一种新的平滑方法，称为假标签平滑。它与一个新的假类共享部分预测概率。实验结果表明，该方法可以提高模型在大多数任务上的性能，并且在文本分类和跨语言迁移任务上优于Label Smoothing。

引用次数: 0

Attention Based Dynamic Graph Learning Framework for Asset Pricing 基于注意力的资产定价动态图学习框架

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482413

Ajim Uddin, Xinyuan Tao, Dantong Yu

Recent studies suggest that financial networks play an essential role in asset valuation and investment decisions. Unlike road networks, financial networks are neither given nor static, posing significant challenges in learning meaningful networks and promoting their applications in price prediction. In this paper, we first apply the attention mechanism to connect the "dots" (firms) and learn dynamic network structures among stocks over time. Next, the end-to-end graph neural networks pipeline diffuses and propagates the firms' accounting fundamentals into the learned networks and ultimately predicts stock future returns. The proposed model reduces the prediction errors by 6% compared to the state-of-the-art models. Our results are robust with different assessment measures. We also show that portfolios based on our model outperform the S&P-500 index by 34% in terms of Sharpe Ratio, suggesting that our model is better at capturing the dynamic inter-connection among firms and identifying stocks with fast recovery from major events. Further investigation on the learned networks reveals that the network structure aligns closely with the market conditions. Finally, with an ablation study, we investigate different alternative versions of our model and the contribution of each component.

最近的研究表明，金融网络在资产评估和投资决策中起着至关重要的作用。与道路网络不同，金融网络既不是给定的，也不是静态的，这在学习有意义的网络和促进其在价格预测中的应用方面提出了重大挑战。在本文中，我们首先运用注意机制来连接“点”(公司)，并学习股票之间随时间的动态网络结构。接下来，端到端图形神经网络管道将公司的会计基础扩散并传播到学习的网络中，并最终预测股票未来的回报。与最先进的模型相比，该模型的预测误差降低了6%。我们的结果与不同的评估措施是稳健的。我们还表明，基于我们模型的投资组合在夏普比率方面比标准普尔500指数高出34%，这表明我们的模型更善于捕捉公司之间的动态相互联系，并识别从重大事件中快速恢复的股票。对学习网络的进一步研究表明，网络结构与市场条件密切相关。最后，通过消融研究，我们研究了模型的不同替代版本以及每个组件的贡献。

{"title":"Attention Based Dynamic Graph Learning Framework for Asset Pricing","authors":"Ajim Uddin, Xinyuan Tao, Dantong Yu","doi":"10.1145/3459637.3482413","DOIUrl":"https://doi.org/10.1145/3459637.3482413","url":null,"abstract":"Recent studies suggest that financial networks play an essential role in asset valuation and investment decisions. Unlike road networks, financial networks are neither given nor static, posing significant challenges in learning meaningful networks and promoting their applications in price prediction. In this paper, we first apply the attention mechanism to connect the \"dots\" (firms) and learn dynamic network structures among stocks over time. Next, the end-to-end graph neural networks pipeline diffuses and propagates the firms' accounting fundamentals into the learned networks and ultimately predicts stock future returns. The proposed model reduces the prediction errors by 6% compared to the state-of-the-art models. Our results are robust with different assessment measures. We also show that portfolios based on our model outperform the S&P-500 index by 34% in terms of Sharpe Ratio, suggesting that our model is better at capturing the dynamic inter-connection among firms and identifying stocks with fast recovery from major events. Further investigation on the learned networks reveals that the network structure aligns closely with the market conditions. Finally, with an ablation study, we investigate different alternative versions of our model and the contribution of each component.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130010299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting AGCNT:基于变压器长序列时间序列预测的自适应图卷积网络

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482054

Hongyang Su, Xiaolong Wang, Yang Qin

Long sequence time-series forecasting(LSTF) plays an important role in a variety of real-world application scenarios, such as electricity forecasting, weather forecasting, and traffic flow forecasting. It has previously been observed that transformer-based models have achieved outstanding results on LSTF tasks, which can reduce the complexity of the model and maintain stable prediction accuracy. Nevertheless, there are still some issues that limit the performance of transformer-based models for LSTF tasks: (i) the potential correlation between sequences is not considered; (ii) the inherent structure of encoder-decoder is difficult to expand after being optimized from the aspect of complexity. In order to solve these two problems, we propose a transformer-based model, named AGCNT, which is efficient and can capture the correlation between the sequences in the multivariate LSTF task without causing the memory bottleneck. Specifically, AGCNT has several characteristics: (i) a probsparse adaptive graph self-attention, which maps long sequences into a low-dimensional dense graph structure with an adaptive graph generation and captures the relationships between sequences with an adaptive graph convolution; (ii) the stacked encoder with distilling probsparse graph self-attention integrates the graph attention mechanism and retains the dominant attention of the cascade layer, which preserves the correlation between sparse queries from long sequences; (iii) the stacked decoder with generative inference generates all prediction values in one forward operation, which can improve the inference speed of long-term predictions. Experimental results on 4 large-scale datasets demonstrate the AGCNT outperforms state-of-the-art baselines.

长序列时间序列预测(LSTF)在电力预测、天气预报、交通流量预测等多种现实应用场景中发挥着重要作用。之前已经观察到，基于变压器的模型在LSTF任务上取得了出色的效果，可以降低模型的复杂性并保持稳定的预测精度。然而，仍然存在一些问题限制了基于变压器的LSTF任务模型的性能:(1)未考虑序列之间的潜在相关性;(ii)编码器-解码器的固有结构从复杂度方面进行优化后难以扩展。为了解决这两个问题，我们提出了一种基于变压器的AGCNT模型，该模型可以高效地捕获多变量LSTF任务中序列之间的相关性，而不会造成内存瓶颈。具体而言，AGCNT具有以下几个特点:(1)概率稀疏自适应图自注意，它通过自适应图生成将长序列映射到低维密集图结构中，并通过自适应图卷积捕获序列之间的关系;(ii)带有提取概率稀疏图自注意的堆叠编码器集成了图注意机制，保留了级联层的主导注意，保留了长序列稀疏查询之间的相关性;(iii)具有生成推理的堆叠解码器在一次前向运算中生成所有预测值，可以提高长期预测的推理速度。在4个大规模数据集上的实验结果表明，AGCNT优于最先进的基线。

{"title":"AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting","authors":"Hongyang Su, Xiaolong Wang, Yang Qin","doi":"10.1145/3459637.3482054","DOIUrl":"https://doi.org/10.1145/3459637.3482054","url":null,"abstract":"Long sequence time-series forecasting(LSTF) plays an important role in a variety of real-world application scenarios, such as electricity forecasting, weather forecasting, and traffic flow forecasting. It has previously been observed that transformer-based models have achieved outstanding results on LSTF tasks, which can reduce the complexity of the model and maintain stable prediction accuracy. Nevertheless, there are still some issues that limit the performance of transformer-based models for LSTF tasks: (i) the potential correlation between sequences is not considered; (ii) the inherent structure of encoder-decoder is difficult to expand after being optimized from the aspect of complexity. In order to solve these two problems, we propose a transformer-based model, named AGCNT, which is efficient and can capture the correlation between the sequences in the multivariate LSTF task without causing the memory bottleneck. Specifically, AGCNT has several characteristics: (i) a probsparse adaptive graph self-attention, which maps long sequences into a low-dimensional dense graph structure with an adaptive graph generation and captures the relationships between sequences with an adaptive graph convolution; (ii) the stacked encoder with distilling probsparse graph self-attention integrates the graph attention mechanism and retains the dominant attention of the cascade layer, which preserves the correlation between sparse queries from long sequences; (iii) the stacked decoder with generative inference generates all prediction values in one forward operation, which can improve the inference speed of long-term predictions. Experimental results on 4 large-scale datasets demonstrate the AGCNT outperforms state-of-the-art baselines.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"2169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130068609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient Learning to Learn a Robust CTR Model for Web-scale Online Sponsored Search Advertising 有效学习学习网络规模在线赞助搜索广告的鲁棒点击率模型

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481912

Xin Wang, Peng Yang, S. Chen, Lin Liu, Liang Zhao, Jiacheng Guo, Mingming Sun, Ping Li

Click-through rate (CTR) prediction is crucial for online sponsored search advertising. Several successful CTR models have been adopted in the industry, including the regularized logistic regression (LR). Nonetheless, the learning process suffers from two limitations: 1) Feature crosses for high-order information may generate trillions of features, which are sparse for online learning examples; 2) Rapid changing of data distribution brings challenges to the accurate learning since the model has to perform a fast adaptation on the new data. Moreover, existing adaptive optimizers are ineffective in handling the sparsity issue for high-dimensional features. In this paper, we propose to learn an optimizer in a meta-learning scenario, where the optimizer is learned on prior data and can be easily adapted to the new data. We firstly build a low-dimensional feature embedding on prior data to encode the association among features. Then, the gradients on new data can be decomposed into the low-dimensional space, enabling the parameter update smoothed and relieving the sparsity. Note that this technology could be deployed into a distributed system to ensure efficient online learning on the trillions-level parameters. We conduct extensive experiments to evaluate the algorithm in terms of prediction accuracy and actual revenue. Experimental results demonstrate that the proposed framework achieves a promising prediction on the new data. The final online revenue is noticeably improved compared to the baseline. This framework was initially deployed in Baidu Search Ads (a.k.a. Phoenix Nest) in 2014 and is currently still being used in certain modules of Baidu's ads systems.

点击率(CTR)预测对在线赞助搜索广告至关重要。业界已经采用了几个成功的CTR模型，包括正则化逻辑回归(LR)。然而，学习过程存在两个局限性:1)高阶信息的特征交叉可能会产生数万亿个特征，这对于在线学习示例来说是稀疏的;2)数据分布的快速变化给模型的准确学习带来了挑战，因为模型必须对新数据进行快速适应。此外，现有的自适应优化器在处理高维特征的稀疏性问题时效果不佳。在本文中，我们建议在元学习场景中学习优化器，其中优化器是在先前数据上学习的，并且可以很容易地适应新数据。首先在先验数据上构建低维特征嵌入，对特征之间的关联进行编码。然后，将新数据上的梯度分解到低维空间中，使参数更新平滑，减轻稀疏性。请注意，这项技术可以部署到分布式系统中，以确保在数万亿级别的参数上有效地在线学习。我们进行了大量的实验来评估该算法的预测准确性和实际收益。实验结果表明，该框架对新数据的预测效果良好。与基线相比，最终的在线收入显著提高。该框架最初于2014年部署在百度搜索广告(又名凤凰巢)中，目前仍在百度广告系统的某些模块中使用。

{"title":"Efficient Learning to Learn a Robust CTR Model for Web-scale Online Sponsored Search Advertising","authors":"Xin Wang, Peng Yang, S. Chen, Lin Liu, Liang Zhao, Jiacheng Guo, Mingming Sun, Ping Li","doi":"10.1145/3459637.3481912","DOIUrl":"https://doi.org/10.1145/3459637.3481912","url":null,"abstract":"Click-through rate (CTR) prediction is crucial for online sponsored search advertising. Several successful CTR models have been adopted in the industry, including the regularized logistic regression (LR). Nonetheless, the learning process suffers from two limitations: 1) Feature crosses for high-order information may generate trillions of features, which are sparse for online learning examples; 2) Rapid changing of data distribution brings challenges to the accurate learning since the model has to perform a fast adaptation on the new data. Moreover, existing adaptive optimizers are ineffective in handling the sparsity issue for high-dimensional features. In this paper, we propose to learn an optimizer in a meta-learning scenario, where the optimizer is learned on prior data and can be easily adapted to the new data. We firstly build a low-dimensional feature embedding on prior data to encode the association among features. Then, the gradients on new data can be decomposed into the low-dimensional space, enabling the parameter update smoothed and relieving the sparsity. Note that this technology could be deployed into a distributed system to ensure efficient online learning on the trillions-level parameters. We conduct extensive experiments to evaluate the algorithm in terms of prediction accuracy and actual revenue. Experimental results demonstrate that the proposed framework achieves a promising prediction on the new data. The final online revenue is noticeably improved compared to the baseline. This framework was initially deployed in Baidu Search Ads (a.k.a. Phoenix Nest) in 2014 and is currently still being used in certain modules of Baidu's ads systems.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134552841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Knowledge-Aware Recommender with Attention-Enhanced Dynamic Convolutional Network 基于注意力增强动态卷积网络的知识感知推荐

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482406

Yi Liu, Bohan Li, Yalei Zang, Aoran Li, Hongzhi Yin

Sequential recommendation systems seek to learn users' preferences to predict their next actions based on the items engaged recently. Static behavior of users requires a long time to form, but short-term interactions with items usually meet some actual needs in reality and are more variable. RNN-based models are always constrained by the strong order assumption and are hard to model the complex and changeable data flexibly. Most of the CNN-based models are limited to the fixed convolutional kernel. All these methods are suboptimal when modeling the dynamics of item-to-item transitions. It is difficult to describe the items with complex relations and extract the fine-grained user preferences from the interaction sequence. To address these issues, we propose a knowledge-aware sequential recommender with the attention-enhanced dynamic convolutional network (KAeDCN). Our model combines the dynamic convolutional network with attention mechanisms to capture changing dependencies in the sequence. Meanwhile, we enhance the representations of items with Knowledge Graph (KG) information through an information fusion module to capture the fine-grained user preferences. The experiments on four public datasets demonstrate that KAeDCN outperforms most of the state-of-the-art sequential recommenders. Furthermore, experimental results also prove that KAeDCN can enhance the representations of items effectively and improve the extractability of sequential dependencies.

顺序推荐系统试图学习用户的偏好，根据最近参与的项目预测他们的下一步行动。用户的静态行为需要很长时间才能形成，而与物品的短期交互通常能满足现实中的一些实际需求，且变化较大。基于rnn的模型总是受到强序假设的约束，难以灵活地对复杂多变的数据进行建模。大多数基于cnn的模型都局限于固定卷积核。在对物品到物品转换的动态建模时，所有这些方法都不是最优的。描述具有复杂关系的项目和从交互序列中提取细粒度的用户偏好是困难的。为了解决这些问题，我们提出了一种基于注意力增强动态卷积网络(KAeDCN)的知识感知顺序推荐器。我们的模型结合了动态卷积网络和注意力机制来捕捉序列中变化的依赖关系。同时，我们通过一个信息融合模块增强知识图(KG)信息对项目的表示，以捕获细粒度的用户偏好。在四个公共数据集上的实验表明，KAeDCN优于大多数最先进的顺序推荐。此外，实验结果还证明了KAeDCN可以有效地增强项目的表征，提高序列依赖的可提取性。

{"title":"A Knowledge-Aware Recommender with Attention-Enhanced Dynamic Convolutional Network","authors":"Yi Liu, Bohan Li, Yalei Zang, Aoran Li, Hongzhi Yin","doi":"10.1145/3459637.3482406","DOIUrl":"https://doi.org/10.1145/3459637.3482406","url":null,"abstract":"Sequential recommendation systems seek to learn users' preferences to predict their next actions based on the items engaged recently. Static behavior of users requires a long time to form, but short-term interactions with items usually meet some actual needs in reality and are more variable. RNN-based models are always constrained by the strong order assumption and are hard to model the complex and changeable data flexibly. Most of the CNN-based models are limited to the fixed convolutional kernel. All these methods are suboptimal when modeling the dynamics of item-to-item transitions. It is difficult to describe the items with complex relations and extract the fine-grained user preferences from the interaction sequence. To address these issues, we propose a knowledge-aware sequential recommender with the attention-enhanced dynamic convolutional network (KAeDCN). Our model combines the dynamic convolutional network with attention mechanisms to capture changing dependencies in the sequence. Meanwhile, we enhance the representations of items with Knowledge Graph (KG) information through an information fusion module to capture the fine-grained user preferences. The experiments on four public datasets demonstrate that KAeDCN outperforms most of the state-of-the-art sequential recommenders. Furthermore, experimental results also prove that KAeDCN can enhance the representations of items effectively and improve the extractability of sequential dependencies.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134349033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6