Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献_第8页

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits 基于上下文强盗的深度神经网络推理的动态早退出调度

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482335

Weiyu Ju, Wei Bao, Liming Ge, Dong Yuan

Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier exit points at the cost of an acceptable amount of accuracy. Scheduling the optimal exit point on a per-instance basis is challenging because the realized performance (i.e., confidence and latency) of each exit point is random and the statistics vary in different scenarios. Moreover, the performance has dependencies among the exit points, further complicating the problem. Therefore, the optimal exit scheduling decision cannot be known in advance but should be learned in an online fashion. To this end, we propose Dynamic Early Exit (DEE), a real-time online learning algorithm based on contextual bandit analysis. DEE observes the performance at each exit point as context and decides whether to exit or keep processing. Unlike standard contextual bandit analyses, the rewards of the decisions in our problem are temporally dependent. Furthermore, the performances of the earlier exit points are inevitably explored more compared to the later ones, which poses an unbalance exploration-exploitation trade-off. DEE addresses the aforementioned challenges, where its regret per inference asymptotically approaches zero. We compare DEE with four benchmark schemes in the real-world experiment. The experiment result shows that DEE can improve the overall performance by up to 98.1% compared to the best benchmark scheme.

深度神经网络(DNN)的最新进展极大地提高了DNN推理的准确性，但也引入了更大的延迟。在本文中，我们研究了如何利用早期退出，这是一种新颖的方法，允许推理在较早的退出点退出，而代价是一个可接受的精度。在每个实例的基础上调度最佳退出点是具有挑战性的，因为每个退出点的实现性能(即置信度和延迟)是随机的，并且统计数据在不同的场景中有所不同。此外，性能在退出点之间存在依赖关系，这进一步使问题复杂化。因此，最优退出调度决策不能提前知道，而应该在线学习。为此，我们提出了动态早期退出(DEE)，一种基于上下文强盗分析的实时在线学习算法。DEE将每个退出点的性能作为上下文进行观察，并决定是退出还是继续处理。与标准的上下文强盗分析不同，我们问题中决策的回报是暂时依赖的。此外，较早的出口点的性能不可避免地比较晚的出口点得到更多的探索，这造成了勘探-开发权衡的不平衡。DEE解决了前面提到的挑战，它的每个推理的后悔量渐近于零。我们在实际实验中将DEE与四种基准方案进行了比较。实验结果表明，与最佳基准方案相比，DEE方案的总体性能提高了98.1%。

{"title":"Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits","authors":"Weiyu Ju, Wei Bao, Liming Ge, Dong Yuan","doi":"10.1145/3459637.3482335","DOIUrl":"https://doi.org/10.1145/3459637.3482335","url":null,"abstract":"Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier exit points at the cost of an acceptable amount of accuracy. Scheduling the optimal exit point on a per-instance basis is challenging because the realized performance (i.e., confidence and latency) of each exit point is random and the statistics vary in different scenarios. Moreover, the performance has dependencies among the exit points, further complicating the problem. Therefore, the optimal exit scheduling decision cannot be known in advance but should be learned in an online fashion. To this end, we propose Dynamic Early Exit (DEE), a real-time online learning algorithm based on contextual bandit analysis. DEE observes the performance at each exit point as context and decides whether to exit or keep processing. Unlike standard contextual bandit analyses, the rewards of the decisions in our problem are temporally dependent. Furthermore, the performances of the earlier exit points are inevitably explored more compared to the later ones, which poses an unbalance exploration-exploitation trade-off. DEE addresses the aforementioned challenges, where its regret per inference asymptotically approaches zero. We compare DEE with four benchmark schemes in the real-world experiment. The experiment result shows that DEE can improve the overall performance by up to 98.1% compared to the best benchmark scheme.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising 百度广告跨模态视频搜索的多模态词典BERT

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481937

Tan Yu, Yi Yang, Yi Li, Lin Liu, Mingming Sun, Ping Li

Due to their attractiveness, video advertisements are adored by advertisers. Baidu, as one of the leading search advertisement platforms in China, is putting more and more effort into video advertisements for its advertisement customers. Search-based video advertisement display is, in essence, a cross-modal retrieval problem, which is normally tackled through joint embedding methods. Nevertheless, due to the lack of interactions between text features and image features, joint embedding methods cannot achieve as high accuracy as its counterpart based on attention. Inspired by the great success achieved by BERT in NLP tasks, many cross-modal BERT models emerge and achieve excellent performance in cross-modal retrieval. Last year, Baidu also launched a cross-modal BERT, CAN, in video advertisement platform, and achieved considerably better performance than the previous joint-embedding model. In this paper, we present our recent work for video advertisement retrieval, Multi-modal Dictionary BERT (MDBERT) model. Compared with CAN and other cross-modal BERT models, MDBERT integrates a joint dictionary, which is shared among video features and word features. It maps the relevant word features and video features into the same codeword and thus fosters effective cross-modal attention. To support end-to-end training, we propose to soften the codeword assignment. Meanwhile, to enhance the inference efficiency, we adopt the product quantization to achieve fine-level feature space partition at a low cost. After launching MDBERT in Baidu video advertising platform, the conversion ratio (CVR) increases by 3.34%, bringing a considerable revenue boost for advertisers in Baidu.

视频广告因其吸引力受到广告主的青睐。百度作为中国领先的搜索广告平台之一，正越来越多地为其广告客户投入视频广告。基于搜索的视频广告展示本质上是一个跨模态检索问题，通常通过联合嵌入方法来解决。然而，由于缺乏文本特征和图像特征之间的交互作用，联合嵌入方法无法达到基于注意力的联合嵌入方法的高准确率。受BERT在NLP任务中取得的巨大成功的启发，出现了许多跨模态BERT模型，并在跨模态检索中取得了优异的性能。去年，百度还在视频广告平台上推出了跨模态BERT CAN，并取得了比之前联合嵌入模型好得多的性能。在本文中，我们介绍了我们最近在视频广告检索方面的工作，多模态字典BERT (MDBERT)模型。与CAN和其他跨模态BERT模型相比，MDBERT集成了一个联合字典，在视频特征和词特征之间共享。它将相关的单词特征和视频特征映射到同一个码字中，从而促进有效的跨模态注意。为了支持端到端训练，我们建议软化码字分配。同时，为了提高推理效率，我们采用了积量化，以低成本实现了精细级别的特征空间划分。MDBERT在百度视频广告平台上线后，转化率(CVR)提高了3.34%，为百度的广告主带来了可观的收入提升。

{"title":"Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising","authors":"Tan Yu, Yi Yang, Yi Li, Lin Liu, Mingming Sun, Ping Li","doi":"10.1145/3459637.3481937","DOIUrl":"https://doi.org/10.1145/3459637.3481937","url":null,"abstract":"Due to their attractiveness, video advertisements are adored by advertisers. Baidu, as one of the leading search advertisement platforms in China, is putting more and more effort into video advertisements for its advertisement customers. Search-based video advertisement display is, in essence, a cross-modal retrieval problem, which is normally tackled through joint embedding methods. Nevertheless, due to the lack of interactions between text features and image features, joint embedding methods cannot achieve as high accuracy as its counterpart based on attention. Inspired by the great success achieved by BERT in NLP tasks, many cross-modal BERT models emerge and achieve excellent performance in cross-modal retrieval. Last year, Baidu also launched a cross-modal BERT, CAN, in video advertisement platform, and achieved considerably better performance than the previous joint-embedding model. In this paper, we present our recent work for video advertisement retrieval, Multi-modal Dictionary BERT (MDBERT) model. Compared with CAN and other cross-modal BERT models, MDBERT integrates a joint dictionary, which is shared among video features and word features. It maps the relevant word features and video features into the same codeword and thus fosters effective cross-modal attention. To support end-to-end training, we propose to soften the codeword assignment. Meanwhile, to enhance the inference efficiency, we adopt the product quantization to achieve fine-level feature space partition at a low cost. After launching MDBERT in Baidu video advertising platform, the conversion ratio (CVR) increases by 3.34%, bringing a considerable revenue boost for advertisers in Baidu.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126842149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models 通过信息共享增强并行深度CTR模型的显式和隐式特征交互

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481915

Bo Chen, Yichao Wang, Zhirong Liu, Ruiming Tang, Wei Guo, Hongkun Zheng, Weiwei Yao, Muyu Zhang, Xiuqiang He

Effectively modeling feature interactions is crucial for CTR prediction in industrial recommender systems. The state-of-the-art deep CTR models with parallel structure (e.g., DCN) learn explicit and implicit feature interactions through independent parallel networks. However, these models suffer from trivial sharing issues, namely insufficient sharing in hidden layers and excessive sharing in network input, limiting the model's expressiveness and effectiveness. Therefore, to enhance information sharing between explicit and implicit feature interactions, we propose a novel deep CTR model EDCN. EDCN introduces two advanced modules, namely bridge module and regulation module, which work collaboratively to capture the layer-wise interactive signals and learn discriminative feature distributions for each hidden layer of the parallel networks. Furthermore, two modules are lightweight and model-agnostic, which can be generalized well to mainstream parallel deep CTR models. Extensive experiments and studies are conducted to demonstrate the effectiveness of EDCN on two public datasets and one industrial dataset. Moreover, the compatibility of two modules over various parallel-structured models is verified, and they have been deployed onto the online advertising platform in Huawei, where a one-month A/B test demonstrates the improvement over the base parallel-structured model by 7.30% and 4.85% in terms of CTR and eCPM, respectively.

在工业推荐系统中，有效地建模特征交互对于CTR预测至关重要。最先进的具有并行结构的深度CTR模型(例如DCN)通过独立的并行网络学习显式和隐式特征交互。然而，这些模型存在着一些微不足道的共享问题，即隐藏层的共享不足和网络输入的共享过多，限制了模型的表达性和有效性。因此，为了增强显式和隐式特征交互之间的信息共享，我们提出了一种新的深度CTR模型EDCN。EDCN引入了桥接模块和调节模块两个高级模块，它们协同工作以捕获分层交互信号并学习并行网络中每个隐藏层的判别特征分布。此外，这两个模块轻量级且与模型无关，可以很好地推广到主流并行深度CTR模型中。为了证明EDCN在两个公共数据集和一个工业数据集上的有效性，进行了大量的实验和研究。验证了两个模块在多种并行结构模型上的兼容性，并将其部署到华为在线广告平台上，经过一个月的a /B测试，CTR和eCPM分别比基本并行结构模型提高了7.30%和4.85%。

{"title":"Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models","authors":"Bo Chen, Yichao Wang, Zhirong Liu, Ruiming Tang, Wei Guo, Hongkun Zheng, Weiwei Yao, Muyu Zhang, Xiuqiang He","doi":"10.1145/3459637.3481915","DOIUrl":"https://doi.org/10.1145/3459637.3481915","url":null,"abstract":"Effectively modeling feature interactions is crucial for CTR prediction in industrial recommender systems. The state-of-the-art deep CTR models with parallel structure (e.g., DCN) learn explicit and implicit feature interactions through independent parallel networks. However, these models suffer from trivial sharing issues, namely insufficient sharing in hidden layers and excessive sharing in network input, limiting the model's expressiveness and effectiveness. Therefore, to enhance information sharing between explicit and implicit feature interactions, we propose a novel deep CTR model EDCN. EDCN introduces two advanced modules, namely bridge module and regulation module, which work collaboratively to capture the layer-wise interactive signals and learn discriminative feature distributions for each hidden layer of the parallel networks. Furthermore, two modules are lightweight and model-agnostic, which can be generalized well to mainstream parallel deep CTR models. Extensive experiments and studies are conducted to demonstrate the effectiveness of EDCN on two public datasets and one industrial dataset. Moreover, the compatibility of two modules over various parallel-structured models is verified, and they have been deployed onto the online advertising platform in Huawei, where a one-month A/B test demonstrates the improvement over the base parallel-structured model by 7.30% and 4.85% in terms of CTR and eCPM, respectively.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

SCMGR SCMGR

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482476

Huanyu Liu, Ruifang He, Liangliang Zhao, Haocheng Wang, Ruifang Wang

Social summarization aims to produce a concise summary that describes the core content of a collection of posts on a specific topic. Existing methods tend to produce sparse or ambiguous representations of posts due to only using short and informal text content. Latest researches use social relations to improve diversity of summaries, yet they model social relations as a regularization item, which has poor flexibility and generalization. Those methods could not embody the deep semantic and social interactions among posts, making summaries still suffer from redundancy. We propose to use Social Context and Multi-Granularity Relations (SCMGR) to improve unsupervised social summarization. It learns more informative representations of posts considering both text semantics and social structure information without any annotated data. First, we design two sociologically motivated meta-paths to construct a social context graph among posts, and adopt a graph convolutional network to aggregate social context information from neighbors. Second, we design a multi-granularity relation decoder to capture the deeper semantic and social interactions from post-word and post-post aspects respectively, which can provide guidance for summary selection from semantic and social structure perspectives. Finally, a sparse reconstruction-based extractor is used to select posts that can best reconstruct original content and social network structure as summaries. Our approach improves the coverage and diversity of summaries. Experimental results on both English and Chinese corpora prove the effectiveness of our model.

{"title":"SCMGR","authors":"Huanyu Liu, Ruifang He, Liangliang Zhao, Haocheng Wang, Ruifang Wang","doi":"10.1145/3459637.3482476","DOIUrl":"https://doi.org/10.1145/3459637.3482476","url":null,"abstract":"Social summarization aims to produce a concise summary that describes the core content of a collection of posts on a specific topic. Existing methods tend to produce sparse or ambiguous representations of posts due to only using short and informal text content. Latest researches use social relations to improve diversity of summaries, yet they model social relations as a regularization item, which has poor flexibility and generalization. Those methods could not embody the deep semantic and social interactions among posts, making summaries still suffer from redundancy. We propose to use Social Context and Multi-Granularity Relations (SCMGR) to improve unsupervised social summarization. It learns more informative representations of posts considering both text semantics and social structure information without any annotated data. First, we design two sociologically motivated meta-paths to construct a social context graph among posts, and adopt a graph convolutional network to aggregate social context information from neighbors. Second, we design a multi-granularity relation decoder to capture the deeper semantic and social interactions from post-word and post-post aspects respectively, which can provide guidance for summary selection from semantic and social structure perspectives. Finally, a sparse reconstruction-based extractor is used to select posts that can best reconstruct original content and social network structure as summaries. Our approach improves the coverage and diversity of summaries. Experimental results on both English and Chinese corpora prove the effectiveness of our model.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122285446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism 对具有注意机制的LSTM长期记忆特性的认识

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482399

Wendong Zheng, Putian Zhao, Kai Huang, Gang Chen

Recent trends of incorporating LSTM network with different attention mechanisms in time series forecasting have led researchers to consider the attention module as an essential component. While existing studies revealed the effectiveness of attention mechanism with some visualization experiments, the underlying rationale behind their outstanding performance on learning long-term dependencies remains hitherto obscure. In this paper, we aim to elaborate on this fundamental question by conducting a thorough investigation of the memory property for LSTM network with attention mechanism. We present a theoretical analysis of LSTM integrated with attention mechanism, and demonstrate that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score. In particular, our theory shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard LSTM. Experimental results on four real-world time series datasets demonstrate the superiority of the attention mechanism for maintaining long-term memory when compared to the state-of-the-art methods, and further corroborate our theoretical analysis.

近年来，将LSTM网络与不同的注意机制结合起来进行时间序列预测的趋势使研究人员认为注意模块是一个必不可少的组成部分。虽然已有研究通过一些可视化实验揭示了注意机制的有效性，但其在学习长期依赖方面表现出色的潜在原理迄今仍不清楚。在本文中，我们旨在通过对具有注意机制的LSTM网络的记忆特性进行深入的研究来阐述这个基本问题。我们对LSTM与注意机制相结合进行了理论分析，并证明LSTM能够产生自适应衰减率，并根据得到的注意分数动态控制记忆衰减。特别是，我们的理论表明，注意机制带来的衰减明显慢于标准LSTM的指数衰减率。在四个真实时间序列数据集上的实验结果证明了注意机制在维持长期记忆方面的优越性，并进一步证实了我们的理论分析。

{"title":"Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism","authors":"Wendong Zheng, Putian Zhao, Kai Huang, Gang Chen","doi":"10.1145/3459637.3482399","DOIUrl":"https://doi.org/10.1145/3459637.3482399","url":null,"abstract":"Recent trends of incorporating LSTM network with different attention mechanisms in time series forecasting have led researchers to consider the attention module as an essential component. While existing studies revealed the effectiveness of attention mechanism with some visualization experiments, the underlying rationale behind their outstanding performance on learning long-term dependencies remains hitherto obscure. In this paper, we aim to elaborate on this fundamental question by conducting a thorough investigation of the memory property for LSTM network with attention mechanism. We present a theoretical analysis of LSTM integrated with attention mechanism, and demonstrate that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score. In particular, our theory shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard LSTM. Experimental results on four real-world time series datasets demonstrate the superiority of the attention mechanism for maintaining long-term memory when compared to the state-of-the-art methods, and further corroborate our theoretical analysis.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"30 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120894288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

HierST: A Unified Hierarchical Spatial-temporal Framework for COVID-19 Trend Forecasting 层次:COVID-19趋势预测的统一分层时空框架

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481927

Shun Zheng, Zhifeng Gao, Wei Cao, Jiang Bian, Tie-Yan Liu

The outbreak of the COVID-19 pandemic has largely influenced the world and our normal daily lives. To combat this pandemic efficiently, governments usually need to coordinate essential resources across multiple regions and adjust intervention polices at the right time, which all call for accurate and robust forecasting of future epidemic trends. However, designing such a forecasting system is non-trivial, since we need to handle all kinds of locations at different administrative levels, which include pretty different epidemic-evolving patterns. Moreover, there are dynamic and volatile correlations of pandemic conditions among these locations, which further enlarge the difficulty in forecasting. With these challenges in mind, we develop a novel spatial-temporal forecasting framework. First, to accommodate all kinds of locations at different administrative levels, we propose a unified hierarchical view, which mimics the aggregation procedure of pandemic statistics. Then, this view motivates us to facilitate joint learning across administrative levels and inspires us to design the cross-level consistency loss as an extra regularization to stabilize model training. Besides, to capture those dynamic and volatile spatial correlations, we design a customized spatial module with adaptive edge gates, which can both reinforce effective messages and disable irrelevant ones. We put this framework into production to help the battle against COVID-19 in the United States. A comprehensive online evaluation across three months demonstrates that our projections are the most competitive ones among all results produced by dozens of international group and even surpass the official ensemble in many cases. We also visualize our unique edge gates to understand the evolvement of spatial correlations and present intuitive case studies. Besides, we open source our implementation at https://github.com/dolphin-zs/HierST to facilitate future research towards better epidemic modeling.

新冠肺炎疫情对世界和我们的日常生活产生了重大影响。为了有效地防治这一流行病，各国政府通常需要协调多个区域的基本资源，并在适当的时候调整干预政策，所有这些都要求对未来的流行病趋势进行准确和有力的预测。然而，设计这样一个预测系统并非易事，因为我们需要处理不同行政级别的各种地点，其中包括相当不同的流行病演变模式。此外，这些地点之间的流行病条件存在动态和不稳定的相关性，这进一步加大了预测的难度。考虑到这些挑战，我们开发了一个新的时空预测框架。首先，为了适应不同行政级别的各种地点，我们提出了一个统一的分层视图，它模仿了大流行统计的汇总程序。然后，这一观点激励我们促进跨管理层的联合学习，并激励我们将跨管理层的一致性损失设计为一个额外的正则化，以稳定模型训练。此外，为了捕捉这些动态和不稳定的空间相关性，我们设计了一个自适应边缘门的定制空间模块，既可以增强有效的信息，又可以禁用不相关的信息。我们把这个框架投入生产是为了帮助美国抗击COVID-19。经过三个月的全面在线评估，我们的预测在数十个国际团体的所有结果中最具竞争力，甚至在许多情况下超过了官方的整体。我们还可视化了我们独特的边缘门，以了解空间相关性的演变，并提供直观的案例研究。此外，我们在https://github.com/dolphin-zs/HierST上开源了我们的实现，以促进未来更好的流行病建模研究。

{"title":"HierST: A Unified Hierarchical Spatial-temporal Framework for COVID-19 Trend Forecasting","authors":"Shun Zheng, Zhifeng Gao, Wei Cao, Jiang Bian, Tie-Yan Liu","doi":"10.1145/3459637.3481927","DOIUrl":"https://doi.org/10.1145/3459637.3481927","url":null,"abstract":"The outbreak of the COVID-19 pandemic has largely influenced the world and our normal daily lives. To combat this pandemic efficiently, governments usually need to coordinate essential resources across multiple regions and adjust intervention polices at the right time, which all call for accurate and robust forecasting of future epidemic trends. However, designing such a forecasting system is non-trivial, since we need to handle all kinds of locations at different administrative levels, which include pretty different epidemic-evolving patterns. Moreover, there are dynamic and volatile correlations of pandemic conditions among these locations, which further enlarge the difficulty in forecasting. With these challenges in mind, we develop a novel spatial-temporal forecasting framework. First, to accommodate all kinds of locations at different administrative levels, we propose a unified hierarchical view, which mimics the aggregation procedure of pandemic statistics. Then, this view motivates us to facilitate joint learning across administrative levels and inspires us to design the cross-level consistency loss as an extra regularization to stabilize model training. Besides, to capture those dynamic and volatile spatial correlations, we design a customized spatial module with adaptive edge gates, which can both reinforce effective messages and disable irrelevant ones. We put this framework into production to help the battle against COVID-19 in the United States. A comprehensive online evaluation across three months demonstrates that our projections are the most competitive ones among all results produced by dozens of international group and even surpass the official ensemble in many cases. We also visualize our unique edge gates to understand the evolvement of spatial correlations and present intuitive case studies. Besides, we open source our implementation at https://github.com/dolphin-zs/HierST to facilitate future research towards better epidemic modeling.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116140283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Formal Analysis of Recommendation Quality of Adversarially-trained Recommenders 对抗性训练推荐者推荐质量的形式化分析

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482046

V. W. Anelli, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.

推荐系统(RSs)采用用户-物品反馈，例如评分，将客户与个性化的产品列表相匹配。top-k推荐的方法主要依赖于学习排序算法，其中应用最广泛的是基于成对优化方法的贝叶斯个性化排序(BPR)。最近，人们发现BPR容易受到模型参数对抗性扰动的影响。对抗性个性化排名(APR)通过对抗性训练程序增强BPR，缓解了这一问题。基于BPR的APR精度性能的实证改进使其在几种推荐模型中得到了广泛的应用。然而，一个被忽视的关键方面是APR的超准确性表现，即新颖性、覆盖率和流行偏差的放大，考虑到最近的研究结果表明，APR的组成部分BPR对偏见的加剧和推荐新颖性的降低很敏感。在这项工作中，我们对BPR和APR优化框架的学习特征进行了建模，并给出了数学证据，表明当反馈数据具有尾部分布时，由于收到的短头项目的积极更新数量不平衡，APR比BPR更能放大流行偏差。利用矩阵分解(MF)，我们通过在两个公共数据集上进行初步实验来实证验证理论结果，比较BPR-MF和APR-MF在准确性和超准确性指标上的性能。实验结果一致表明，新颖性和覆盖措施的退化和令人担忧的偏见放大。

{"title":"A Formal Analysis of Recommendation Quality of Adversarially-trained Recommenders","authors":"V. W. Anelli, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra","doi":"10.1145/3459637.3482046","DOIUrl":"https://doi.org/10.1145/3459637.3482046","url":null,"abstract":"Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116591267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Tabular Functional Block Detection with Embedding-based Agglomerative Cell Clustering 基于嵌入的聚集细胞聚类的表功能块检测

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482484

Kexuan Sun, Fei Wang, Muhao Chen, J. Pujara

Tables are a widely-used format for data curation. The diversity of domains, layouts, and content of tables makes knowledge extraction challenging. Understanding table layouts is an important step for automatically harvesting knowledge from tabular data. Since table cells are spatially organized into regions, correctly identifying such regions and inferring their functional roles, referred to as functional block detection, is a critical part of understanding table layouts. Earlier functional block detection approaches fail to leverage spatial relationships and higher-level structure, either depending on cell-level predictions or relying on data types as signals for identifying blocks. In this paper, we introduce a flexible functional block detection method by applying agglomerative clustering techniques which merge smaller blocks into larger blocks using two merging strategies. Our proposed method uses cell embeddings with a customized dissimilarity function which utilizes local and margin distances, as well as block coherence metrics to capture cell, block, and table scoped features. Given the diversity of tables in real-world corpora, we also introduce a sampling-based approach for automatically tuning distance thresholds for each table. Experimental results show that our method improves over the earlier state-of-the-art method in terms of several evaluation metrics.

表是一种广泛使用的数据管理格式。表的领域、布局和内容的多样性使得知识提取具有挑战性。理解表格布局是从表格数据中自动获取知识的重要步骤。由于表单元格在空间上被组织成区域，因此正确识别这些区域并推断它们的功能角色(称为功能块检测)是理解表布局的关键部分。早期的功能块检测方法无法利用空间关系和高层结构，要么依赖于单元级预测，要么依赖于数据类型作为识别块的信号。在本文中，我们引入了一种灵活的功能块检测方法，该方法采用聚合聚类技术，使用两种合并策略将小块合并成大块。我们提出的方法使用具有自定义不相似函数的单元嵌入，该函数利用局部和边缘距离以及块相干度量来捕获单元、块和表范围的特征。考虑到真实语料库中表的多样性，我们还引入了一种基于采样的方法来自动调整每个表的距离阈值。实验结果表明，我们的方法在几个评价指标方面比早期的最先进的方法有所改进。

{"title":"Tabular Functional Block Detection with Embedding-based Agglomerative Cell Clustering","authors":"Kexuan Sun, Fei Wang, Muhao Chen, J. Pujara","doi":"10.1145/3459637.3482484","DOIUrl":"https://doi.org/10.1145/3459637.3482484","url":null,"abstract":"Tables are a widely-used format for data curation. The diversity of domains, layouts, and content of tables makes knowledge extraction challenging. Understanding table layouts is an important step for automatically harvesting knowledge from tabular data. Since table cells are spatially organized into regions, correctly identifying such regions and inferring their functional roles, referred to as functional block detection, is a critical part of understanding table layouts. Earlier functional block detection approaches fail to leverage spatial relationships and higher-level structure, either depending on cell-level predictions or relying on data types as signals for identifying blocks. In this paper, we introduce a flexible functional block detection method by applying agglomerative clustering techniques which merge smaller blocks into larger blocks using two merging strategies. Our proposed method uses cell embeddings with a customized dissimilarity function which utilizes local and margin distances, as well as block coherence metrics to capture cell, block, and table scoped features. Given the diversity of tables in real-world corpora, we also introduce a sampling-based approach for automatically tuning distance thresholds for each table. Experimental results show that our method improves over the earlier state-of-the-art method in terms of several evaluation metrics.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Advertising Incrementality Testing: Practical Lessons And Emerging Challenges 在线广告增量测试:实践教训和新出现的挑战

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482031

Joel Barajas, Narayan L. Bhamidipati, J. Shanahan

Online advertising has historically been approached as an ad-to-user matching problem within sophisticated optimization algorithms. As the research and ad-tech industries have progressed, advertisers have increasingly emphasized the causal effect estimation of their ads (incrementality) using controlled experiments (A/B testing). With low lift effects and sparse conversion, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in measurement precision. Similarly, the correct interpretation of results addressing a business goal requires significant data science and experimentation research expertise. We propose a practical tutorial in the incrementality testing landscape, including: The business need; Literature solutions and industry practices; Designs in the development of testing platforms; The testing cycle, case studies, and recommendations. We provide first-hand lessons based on the development of such a platform in a major combined DSP and ad network, and after running several tests for up to two months each over recent years.

在线广告历来被视为复杂优化算法中的广告与用户匹配问题。随着研究和广告技术行业的发展，广告商越来越强调使用控制实验(A/B测试)来评估广告的因果效应(增量)。由于升力效应低、转换稀疏，增量式测试平台的大规模开发对测量精度提出了巨大的工程挑战。同样，正确解释解决业务目标的结果需要重要的数据科学和实验研究专业知识。我们在增量测试领域提出了一个实用的教程，包括:业务需求;文学解决方案和行业实践;测试平台开发设计;测试周期、案例研究和建议。我们提供了第一手的经验，基于这样一个平台的发展，在一个主要的DSP和广告网络，并在最近几年运行了几个测试，每个长达两个月。

{"title":"Online Advertising Incrementality Testing: Practical Lessons And Emerging Challenges","authors":"Joel Barajas, Narayan L. Bhamidipati, J. Shanahan","doi":"10.1145/3459637.3482031","DOIUrl":"https://doi.org/10.1145/3459637.3482031","url":null,"abstract":"Online advertising has historically been approached as an ad-to-user matching problem within sophisticated optimization algorithms. As the research and ad-tech industries have progressed, advertisers have increasingly emphasized the causal effect estimation of their ads (incrementality) using controlled experiments (A/B testing). With low lift effects and sparse conversion, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in measurement precision. Similarly, the correct interpretation of results addressing a business goal requires significant data science and experimentation research expertise. We propose a practical tutorial in the incrementality testing landscape, including: The business need; Literature solutions and industry practices; Designs in the development of testing platforms; The testing cycle, case studies, and recommendations. We provide first-hand lessons based on the development of such a platform in a major combined DSP and ad network, and after running several tests for up to two months each over recent years.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124876016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Graph Mining 公平图挖掘

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482030

Jian Kang, Hanghang Tong

In today's increasingly connected world, graph mining plays a pivotal role in many real-world application domains, including social network analysis, recommendations, marketing and financial security. Tremendous efforts have been made to develop a wide range of computational models. However, recent studies have revealed that many widely-applied graph mining models could suffer from potential discrimination. Fairness on graph mining aims to develop strategies in order to mitigate bias introduced/amplified during the mining process. The unique challenges of enforcing fairness on graph mining include (1) theoretical challenge on non-IID nature of graph data, which may invalidate the basic assumption behind many existing studies in fair machine learning, and (2) algorithmic challenge on the dilemma of balancing model accuracy and fairness. This tutorial aims to (1) present a comprehensive review of state-of-the-art techniques in fairness on graph mining and (2) identify the open challenges and future trends. In particular, we start with reviewing the background, problem definitions, unique challenges and related problems; then we will focus on an in-depth overview of (1) recent techniques in enforcing group fairness, individual fairness and other fairness notions in the context of graph mining, and (2) future directions in studying algorithmic fairness on graphs. We believe this tutorial could be attractive to researchers and practitioners in areas including data mining, artificial intelligence, social science and beneficial to a plethora of real-world application domains.

在当今日益互联的世界中，图挖掘在许多现实世界的应用领域中发挥着关键作用，包括社交网络分析、推荐、营销和金融安全。人们已经做出了巨大的努力来开发各种各样的计算模型。然而，最近的研究表明，许多广泛应用的图挖掘模型可能遭受潜在的歧视。图挖掘的公平性旨在制定策略，以减轻在挖掘过程中引入/放大的偏见。在图挖掘中实现公平性的独特挑战包括:(1)对图数据非iid性质的理论挑战，这可能会使许多现有公平机器学习研究背后的基本假设失效;(2)对平衡模型准确性和公平性的算法挑战。本教程旨在(1)对图挖掘公平性方面的最新技术进行全面回顾，(2)确定开放的挑战和未来趋势。特别地，我们从回顾背景、问题定义、独特的挑战和相关问题开始;然后，我们将重点深入概述(1)在图挖掘背景下执行群体公平、个人公平和其他公平概念的最新技术，以及(2)在图上研究算法公平的未来方向。我们相信本教程对数据挖掘、人工智能、社会科学等领域的研究人员和实践者具有吸引力，并有利于大量现实世界的应用领域。

{"title":"Fair Graph Mining","authors":"Jian Kang, Hanghang Tong","doi":"10.1145/3459637.3482030","DOIUrl":"https://doi.org/10.1145/3459637.3482030","url":null,"abstract":"In today's increasingly connected world, graph mining plays a pivotal role in many real-world application domains, including social network analysis, recommendations, marketing and financial security. Tremendous efforts have been made to develop a wide range of computational models. However, recent studies have revealed that many widely-applied graph mining models could suffer from potential discrimination. Fairness on graph mining aims to develop strategies in order to mitigate bias introduced/amplified during the mining process. The unique challenges of enforcing fairness on graph mining include (1) theoretical challenge on non-IID nature of graph data, which may invalidate the basic assumption behind many existing studies in fair machine learning, and (2) algorithmic challenge on the dilemma of balancing model accuracy and fairness. This tutorial aims to (1) present a comprehensive review of state-of-the-art techniques in fairness on graph mining and (2) identify the open challenges and future trends. In particular, we start with reviewing the background, problem definitions, unique challenges and related problems; then we will focus on an in-depth overview of (1) recent techniques in enforcing group fairness, individual fairness and other fairness notions in the context of graph mining, and (2) future directions in studying algorithmic fairness on graphs. We believe this tutorial could be attractive to researchers and practitioners in areas including data mining, artificial intelligence, social science and beneficial to a plethora of real-world application domains.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124930884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21