首页 > 最新文献

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)最新文献

英文 中文
Distributional Correlation-Aware Knowledge Distillation for Stock Trading Volume Prediction 分布式关联感知知识精馏在股票交易量预测中的应用
Lei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun
Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by $5times$ while maintaining $99.6%$ prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios.
传统的分类问题中的知识蒸馏是通过教师模型产生的软标签中的类相关性来转移知识的,这在股票交易量预测等回归问题中是不可用的。为了解决这个问题,我们提出了一个新的蒸馏框架,用于训练一个轻量级的学生模型,在给定历史交易数据的情况下执行交易量预测。具体来说,我们将回归模型转化为概率预测模型,通过训练模型来预测交易量所属的高斯分布。因此,通过将其预测的分布与教师的分布相匹配,学生模型可以在更有信息的分布水平上向教师学习。进一步介绍了两个相关的蒸馏目标,以鼓励学生与教师模型产生一致的成对关系。我们用两个不同的时间窗口设置在真实世界的存量数据集上评估了该框架。实验表明,我们的框架优于强基线模型,将模型大小压缩了5倍,同时保持99.6%的预测精度。广泛的分析进一步表明,在低资源情况下,我们的框架比香草蒸馏方法更有效。
{"title":"Distributional Correlation-Aware Knowledge Distillation for Stock Trading Volume Prediction","authors":"Lei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun","doi":"10.48550/arXiv.2208.07232","DOIUrl":"https://doi.org/10.48550/arXiv.2208.07232","url":null,"abstract":"Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by $5times$ while maintaining $99.6%$ prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88659625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem 基于柯西问题理解视觉变形器的对抗鲁棒性
Zheng Wang, Wenjie Ruan
Recent research on the robustness of deep learning has shown that Vision Transformers (ViTs) surpass the Convolutional Neural Networks (CNNs) under some perturbations, e.g., natural corruption, adversarial attacks, etc. Some papers argue that the superior robustness of ViT comes from the segmentation of its input images; others say that the Multi-head Self-Attention (MSA) is the key to preserving the robustness. In this paper, we aim to introduce a principled and unified theoretical framework to investigate such an argument on ViT's robustness. We first theoretically prove that, unlike Transformers in Natural Language Processing, ViTs are Lipschitz continuous. Then we theoretically analyze the adversarial robustness of ViTs from the perspective of the Cauchy Problem, via which we can quantify how the robustness propagates through layers. We demonstrate that the first and last layers are the critical factors to affect the robustness of ViTs. Furthermore, based on our theory, we empirically show that unlike the claims from existing research, MSA only contributes to the adversarial robustness of ViTs under weak adversarial attacks, e.g., FGSM, and surprisingly, MSA actually comprises the model's adversarial robustness under stronger attacks, e.g., PGD attacks.
最近对深度学习鲁棒性的研究表明,视觉变形(ViTs)在一些扰动下,如自然腐败、对抗性攻击等,超过了卷积神经网络(cnn)。一些论文认为ViT的鲁棒性来自于对输入图像的分割;也有人认为多头自注意(MSA)是保持鲁棒性的关键。在本文中,我们旨在引入一个原则性和统一的理论框架来研究这种关于ViT鲁棒性的论点。我们首先从理论上证明,与自然语言处理中的变形金刚不同,vit是利普希茨连续的。然后从柯西问题的角度对vit的对抗鲁棒性进行了理论分析,量化了鲁棒性如何在各层间传播。我们证明了第一层和最后一层是影响vit鲁棒性的关键因素。此外,根据我们的理论,我们通过经验表明,与现有研究的说法不同,MSA仅有助于vit在弱对抗性攻击(例如FGSM)下的对抗性鲁棒性,而令人惊讶的是,MSA实际上包含了模型在更强攻击(例如PGD攻击)下的对抗性鲁棒性。
{"title":"Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem","authors":"Zheng Wang, Wenjie Ruan","doi":"10.48550/arXiv.2208.00906","DOIUrl":"https://doi.org/10.48550/arXiv.2208.00906","url":null,"abstract":"Recent research on the robustness of deep learning has shown that Vision Transformers (ViTs) surpass the Convolutional Neural Networks (CNNs) under some perturbations, e.g., natural corruption, adversarial attacks, etc. Some papers argue that the superior robustness of ViT comes from the segmentation of its input images; others say that the Multi-head Self-Attention (MSA) is the key to preserving the robustness. In this paper, we aim to introduce a principled and unified theoretical framework to investigate such an argument on ViT's robustness. We first theoretically prove that, unlike Transformers in Natural Language Processing, ViTs are Lipschitz continuous. Then we theoretically analyze the adversarial robustness of ViTs from the perspective of the Cauchy Problem, via which we can quantify how the robustness propagates through layers. We demonstrate that the first and last layers are the critical factors to affect the robustness of ViTs. Furthermore, based on our theory, we empirically show that unlike the claims from existing research, MSA only contributes to the adversarial robustness of ViTs under weak adversarial attacks, e.g., FGSM, and surprisingly, MSA actually comprises the model's adversarial robustness under stronger attacks, e.g., PGD attacks.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77642926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation 语境信息和基于常识的对话情感识别提示
Jingjie Yi, Deqing Yang, Siyu Yuan, Caiyan Cao, Zhiyao Zhang, Yanghua Xiao
Emotion recognition in conversation (ERC) aims to detect the emotion for each utterance in a given conversation. The newly proposed ERC models have leveraged pre-trained language models (PLMs) with the paradigm of pre-training and fine-tuning to obtain good performance. However, these models seldom exploit PLMs' advantages thoroughly, and perform poorly for the conversations lacking explicit emotional expressions. In order to fully leverage the latent knowledge related to the emotional expressions in utterances, we propose a novel ERC model CISPER with the new paradigm of prompt and language model (LM) tuning. Specifically, CISPER is equipped with the prompt blending the contextual information and commonsense related to the interlocutor's utterances, to achieve ERC more effectively. Our extensive experiments demonstrate CISPER's superior performance over the state-of-the-art ERC models, and the effectiveness of leveraging these two kinds of significant prompt information for performance gains. To reproduce our experimental results conveniently, CISPER's sourcecode and the datasets have been shared at https://github.com/DeqingYang/CISPER.
对话中的情感识别(ERC)旨在检测给定对话中每个话语的情感。新提出的ERC模型利用预训练语言模型(PLMs)的预训练和微调范式来获得良好的性能。然而,这些模型很少充分利用plm的优势,并且在缺乏明确情感表达的对话中表现不佳。为了充分利用话语中与情感表达相关的潜在知识,我们提出了一种新的ERC模型CISPER,该模型采用提示和语言模型(LM)调整的新范式。具体而言,CISPER配备了融合上下文信息和与对话者话语相关的常识的提示,以更有效地实现ERC。我们的大量实验证明了CISPER在最先进的ERC模型上的卓越性能,以及利用这两种重要提示信息提高性能的有效性。为了方便再现我们的实验结果,CISPER的源代码和数据集已在https://github.com/DeqingYang/CISPER上共享。
{"title":"Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation","authors":"Jingjie Yi, Deqing Yang, Siyu Yuan, Caiyan Cao, Zhiyao Zhang, Yanghua Xiao","doi":"10.48550/arXiv.2207.13254","DOIUrl":"https://doi.org/10.48550/arXiv.2207.13254","url":null,"abstract":"Emotion recognition in conversation (ERC) aims to detect the emotion for each utterance in a given conversation. The newly proposed ERC models have leveraged pre-trained language models (PLMs) with the paradigm of pre-training and fine-tuning to obtain good performance. However, these models seldom exploit PLMs' advantages thoroughly, and perform poorly for the conversations lacking explicit emotional expressions. In order to fully leverage the latent knowledge related to the emotional expressions in utterances, we propose a novel ERC model CISPER with the new paradigm of prompt and language model (LM) tuning. Specifically, CISPER is equipped with the prompt blending the contextual information and commonsense related to the interlocutor's utterances, to achieve ERC more effectively. Our extensive experiments demonstrate CISPER's superior performance over the state-of-the-art ERC models, and the effectiveness of leveraging these two kinds of significant prompt information for performance gains. To reproduce our experimental results conveniently, CISPER's sourcecode and the datasets have been shared at https://github.com/DeqingYang/CISPER.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80692792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning 基于离线排序策略学习的高效混合整数规划分支排序
Zeren Huang, Wenhao Chen, Weinan Zhang, Chuhan Shi, Furui Liu, Hui-Ling Zhen, M. Yuan, Jianye Hao, Yong Yu, Jun Wang
Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.
在分支定界问题中,给出一个好的变量选择策略是保证现代混合整数规划(MIP)求解效率的关键。由于在之前的解决过程中收集了MIP分支数据,学习分支方法最近变得优于启发式方法。由于分支定界是一个自然的顺序决策任务,人们应该学会优化整个MIP求解过程的效用,而不是在每一步都短视。在这项工作中,我们将分支学习制定为离线强化学习(RL)问题,并提出了一种长期的混合搜索方案来构建离线MIP数据集,该数据集重视分支决策的长期效用。在策略训练阶段,我们采用基于排名的奖励分配方案,从长期和短期角度区分有希望的样本,并通过离线策略学习训练分支模型Branch Ranking。在综合MIP基准和现实世界任务上的实验表明,与广泛使用的启发式和最先进的基于学习的分支模型相比,Branch Rankink更高效、鲁棒,可以更好地泛化到大规模的MIP实例。
{"title":"Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning","authors":"Zeren Huang, Wenhao Chen, Weinan Zhang, Chuhan Shi, Furui Liu, Hui-Ling Zhen, M. Yuan, Jianye Hao, Yong Yu, Jun Wang","doi":"10.48550/arXiv.2207.13701","DOIUrl":"https://doi.org/10.48550/arXiv.2207.13701","url":null,"abstract":"Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74032765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention, Filling in The Gaps for Generalization in Routing Problems 注意,填补路由问题泛化的空白
Ahmad Bdeir, Jonas K. Falkner, L. Schmidt-Thieme
Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created for particular problem instances that become infeasible at larger problem sizes. This paper aims at encouraging the consolidation of the field through understanding and improving current existing models, namely the attention model by Kool et al. We identify two discrepancy categories for VRP generalization. The first is based on the differences that are inherent to the problems themselves, and the second relates to architectural weaknesses that limit the model's ability to generalize. Our contribution becomes threefold: We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention based on the alpha-entmax activation. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios. Finally, we introduce a framework for inference level data augmentation that improves performance by leveraging the model's lack of invariance to rotation and dilation changes.
机器学习(ML)方法已经成为解决车辆路线问题的有用工具,无论是与流行的启发式方法结合使用,还是作为独立模型使用。然而,目前的方法在处理不同规模或不同分布的问题时泛化能力差。因此,车辆路线中的机器学习见证了一个扩展阶段,为特定问题实例创建了新的方法,这些方法在更大的问题规模上变得不可行的。本文旨在通过理解和改进现有的模型,即Kool等人的注意力模型,鼓励该领域的巩固。我们确定了VRP泛化的两个差异类别。第一个是基于问题本身固有的差异,第二个是与限制模型泛化能力的体系结构弱点有关。我们的贡献有三个方面:我们首先通过采用Kool等人的方法及其基于α -entmax激活的稀疏动态注意损失函数来瞄准模型差异。然后,我们通过使用混合实例训练方法来瞄准固有差异,该方法已被证明在某些情况下优于单实例训练。最后,我们引入了一个用于推理级数据增强的框架,该框架通过利用模型对旋转和膨胀变化缺乏不变性来提高性能。
{"title":"Attention, Filling in The Gaps for Generalization in Routing Problems","authors":"Ahmad Bdeir, Jonas K. Falkner, L. Schmidt-Thieme","doi":"10.48550/arXiv.2207.07212","DOIUrl":"https://doi.org/10.48550/arXiv.2207.07212","url":null,"abstract":"Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created for particular problem instances that become infeasible at larger problem sizes. This paper aims at encouraging the consolidation of the field through understanding and improving current existing models, namely the attention model by Kool et al. We identify two discrepancy categories for VRP generalization. The first is based on the differences that are inherent to the problems themselves, and the second relates to architectural weaknesses that limit the model's ability to generalize. Our contribution becomes threefold: We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention based on the alpha-entmax activation. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios. Finally, we introduce a framework for inference level data augmentation that improves performance by leveraging the model's lack of invariance to rotation and dilation changes.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81156068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MRF-UNets: Searching UNet with Markov Random Fields mrf -UNet:用马尔可夫随机场搜索UNet
Zifu Wang, Matthew B. Blaschko
UNet [27] is widely used in semantic segmentation due to its simplicity and effectiveness. However, its manually-designed architecture is applied to a large number of problem settings, either with no architecture optimizations, or with manual tuning, which is time consuming and can be sub-optimal. In this work, firstly, we propose Markov Random Field Neural Architecture Search (MRF-NAS) that extends and improves the recent Adaptive and Optimal Network Width Search (AOWS) method [4] with (i) a more general MRF framework (ii) diverse M-best loopy inference (iii) differentiable parameter learning. This provides the necessary NAS framework to efficiently explore network architectures that induce loopy inference graphs, including loops that arise from skip connections. With UNet as the backbone, we find an architecture, MRF-UNet, that shows several interesting characteristics. Secondly, through the lens of these characteristics, we identify the sub-optimality of the original UNet architecture and further improve our results with MRF-UNetV2. Experiments show that our MRF-UNets significantly outperform several benchmarks on three aerial image datasets and two medical image datasets while maintaining low computational costs. The code is available at: https://github.com/zifuwanggg/MRF-UNets.
UNet[27]因其简单有效而被广泛应用于语义分割。然而,它的手工设计的体系结构应用于大量的问题设置,要么没有进行体系结构优化,要么进行了手动调优,这既耗时又可能不是最优的。在这项工作中,首先,我们提出了马尔可夫随机场神经架构搜索(MRF- nas),它扩展和改进了最近的自适应和最优网络宽度搜索(AOWS)方法[4],具有(i)更通用的MRF框架(ii)多样化的m -最佳环路推理(iii)可微参数学习。这提供了必要的NAS框架,以有效地探索导致循环推理图的网络体系结构,包括由跳过连接产生的循环。以UNet为骨干,我们发现了一个架构,MRF-UNet,它显示了几个有趣的特征。其次,通过这些特征,我们确定了原始UNet架构的次优性,并进一步改进了MRF-UNetV2的结果。实验表明,我们的MRF-UNets在三个航空图像数据集和两个医学图像数据集上显著优于几个基准,同时保持较低的计算成本。代码可从https://github.com/zifuwanggg/MRF-UNets获得。
{"title":"MRF-UNets: Searching UNet with Markov Random Fields","authors":"Zifu Wang, Matthew B. Blaschko","doi":"10.48550/arXiv.2207.06168","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06168","url":null,"abstract":"UNet [27] is widely used in semantic segmentation due to its simplicity and effectiveness. However, its manually-designed architecture is applied to a large number of problem settings, either with no architecture optimizations, or with manual tuning, which is time consuming and can be sub-optimal. In this work, firstly, we propose Markov Random Field Neural Architecture Search (MRF-NAS) that extends and improves the recent Adaptive and Optimal Network Width Search (AOWS) method [4] with (i) a more general MRF framework (ii) diverse M-best loopy inference (iii) differentiable parameter learning. This provides the necessary NAS framework to efficiently explore network architectures that induce loopy inference graphs, including loops that arise from skip connections. With UNet as the backbone, we find an architecture, MRF-UNet, that shows several interesting characteristics. Secondly, through the lens of these characteristics, we identify the sub-optimality of the original UNet architecture and further improve our results with MRF-UNetV2. Experiments show that our MRF-UNets significantly outperform several benchmarks on three aerial image datasets and two medical image datasets while maintaining low computational costs. The code is available at: https://github.com/zifuwanggg/MRF-UNets.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90542317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Logistics, Graphs, and Transformers: Towards improving Travel Time Estimation 物流、图表和变形器:改进旅行时间估计
Natalia Semenova, Vadim Porvatov, V. Tishin, Artyom Sosedka, Vladislav Zamkovoy
The problem of travel time estimation is widely considered as the fundamental challenge of modern logistics. The complex nature of interconnections between spatial aspects of roads and temporal dynamics of ground transport still preserves an area to experiment with. However, the total volume of currently accumulated data encourages the construction of the learning models which have the perspective to significantly outperform earlier solutions. In order to address the problems of travel time estimation, we propose a new method based on transformer architecture - TransTTE.
运输时间估计问题被广泛认为是现代物流的根本挑战。道路的空间方面和地面运输的时间动态之间相互联系的复杂性质仍然保留了一个可供试验的领域。然而,当前积累的数据总量鼓励构建具有明显优于早期解决方案的视角的学习模型。为了解决行程时间估计的问题,我们提出了一种基于变压器结构的新方法——TransTTE。
{"title":"Logistics, Graphs, and Transformers: Towards improving Travel Time Estimation","authors":"Natalia Semenova, Vadim Porvatov, V. Tishin, Artyom Sosedka, Vladislav Zamkovoy","doi":"10.48550/arXiv.2207.05835","DOIUrl":"https://doi.org/10.48550/arXiv.2207.05835","url":null,"abstract":"The problem of travel time estimation is widely considered as the fundamental challenge of modern logistics. The complex nature of interconnections between spatial aspects of roads and temporal dynamics of ground transport still preserves an area to experiment with. However, the total volume of currently accumulated data encourages the construction of the learning models which have the perspective to significantly outperform earlier solutions. In order to address the problems of travel time estimation, we propose a new method based on transformer architecture - TransTTE.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75365716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FairDistillation: Mitigating Stereotyping in Language Models 公平蒸馏:减轻语言模型中的刻板印象
Pieter Delobelle, Bettina Berendt
Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when considering different biases. To address this, we present FairDistillation: a cross-lingual method based on knowledge distillation to construct smaller language models while controlling for specific biases. We found that our distillation method does not negatively affect the downstream performance on most tasks and successfully mitigates stereotyping and representational harms. We demonstrate that FairDistillation can create fairer language models at a considerably lower cost than alternative approaches.
大型预训练语言模型已经成功地应用于多种语言的各种任务中。随着这种不断增加的使用,有害副作用的风险也在增加,例如通过复制和强化刻板印象。然而,检测和减轻这些危害通常是很难做到的,并且在处理多种语言或考虑不同的偏差时,计算成本会很高。为了解决这个问题,我们提出了FairDistillation:一种基于知识蒸馏的跨语言方法,在控制特定偏差的同时构建更小的语言模型。我们发现我们的蒸馏方法不会对大多数任务的下游性能产生负面影响,并且成功地减轻了刻板印象和代表性危害。我们证明FairDistillation可以以比其他方法低得多的成本创建更公平的语言模型。
{"title":"FairDistillation: Mitigating Stereotyping in Language Models","authors":"Pieter Delobelle, Bettina Berendt","doi":"10.48550/arXiv.2207.04546","DOIUrl":"https://doi.org/10.48550/arXiv.2207.04546","url":null,"abstract":"Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when considering different biases. To address this, we present FairDistillation: a cross-lingual method based on knowledge distillation to construct smaller language models while controlling for specific biases. We found that our distillation method does not negatively affect the downstream performance on most tasks and successfully mitigates stereotyping and representational harms. We demonstrate that FairDistillation can create fairer language models at a considerably lower cost than alternative approaches.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74295205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Model Selection in Reinforcement Learning with General Function Approximations 基于一般函数逼近的强化学习模型选择
Avishek Ghosh, Sayak Ray Chowdhury
We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by $mathcal{F}$ and $mathcal{M}$, where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given $M$ nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that emph{adapt} to the smallest function class (among the nested $M$ classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $cF$ and $cM$) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon $T$.
我们考虑经典强化学习(RL)环境的模型选择-多武装强盗(MABs)和马尔可夫决策过程(mdp) -在一般函数近似下。在模型选择框架中,我们不知道用$mathcal{F}$和$mathcal{M}$表示的函数类,它们分别是真正的模型——mab的奖励生成函数和mdp的转换内核。相反,我们得到$M$嵌套函数(假设)类,这样真实的模型至少包含在一个这样的类中。在本文中,我们提出并分析了mab和mdp的有效模型选择算法,该算法emph{适应}到包含真实底层模型的最小函数类(在嵌套的$M$类中)。在嵌套假设类的可分性假设下,我们证明了自适应算法的累积遗憾与先验地知道正确函数类(即$cF$和$cM$)的oracle的累积遗憾相匹配。此外,对于这两种设置,我们表明模型选择的成本是遗憾中的一个附加项,对学习视界具有弱(对数)依赖性$T$。
{"title":"Model Selection in Reinforcement Learning with General Function Approximations","authors":"Avishek Ghosh, Sayak Ray Chowdhury","doi":"10.48550/arXiv.2207.02992","DOIUrl":"https://doi.org/10.48550/arXiv.2207.02992","url":null,"abstract":"We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by $mathcal{F}$ and $mathcal{M}$, where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given $M$ nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that emph{adapt} to the smallest function class (among the nested $M$ classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $cF$ and $cM$) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon $T$.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89225991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PRoA: A Probabilistic Robustness Assessment against Functional Perturbations 针对功能扰动的概率鲁棒性评估
Tianle Zhang, Wenjie Ruan, J. Fieldsend
In safety-critical deep learning applications robustness measurement is a vital pre-deployment phase. However, existing robustness verification methods are not sufficiently practical for deploying machine learning systems in the real world. On the one hand, these methods attempt to claim that no perturbations can ``fool'' deep neural networks (DNNs), which may be too stringent in practice. On the other hand, existing works rigorously consider $L_p$ bounded additive perturbations on the pixel space, although perturbations, such as colour shifting and geometric transformations, are more practically and frequently occurring in the real world. Thus, from the practical standpoint, we present a novel and general {it probabilistic robustness assessment method} (PRoA) based on the adaptive concentration, and it can measure the robustness of deep learning models against functional perturbations. PRoA can provide statistical guarantees on the probabilistic robustness of a model, textit{i.e.}, the probability of failure encountered by the trained model after deployment. Our experiments demonstrate the effectiveness and flexibility of PRoA in terms of evaluating the probabilistic robustness against a broad range of functional perturbations, and PRoA can scale well to various large-scale deep neural networks compared to existing state-of-the-art baselines. For the purpose of reproducibility, we release our tool on GitHub: url{ https://github.com/TrustAI/PRoA}.
在安全关键型深度学习应用中,鲁棒性测量是重要的部署前阶段。然而,现有的鲁棒性验证方法对于在现实世界中部署机器学习系统并不足够实用。一方面,这些方法试图声称没有扰动可以“欺骗”深度神经网络(dnn),这在实践中可能过于严格。另一方面,现有的作品严格考虑$L_p$像素空间上的有界加性扰动,尽管扰动,如颜色移动和几何变换,在现实世界中更实际和频繁地发生。因此,从实际应用的角度出发,我们提出了一种基于自适应集中的新型通用{it概率鲁棒性评估方法}(PRoA),它可以衡量深度学习模型对功能扰动的鲁棒性。PRoA可以为模型的概率鲁棒性提供统计保证,textit{即}训练后的模型在部署后遇到故障的概率。我们的实验证明了PRoA在评估针对广泛功能扰动的概率鲁棒性方面的有效性和灵活性,并且与现有的最先进基线相比,PRoA可以很好地扩展到各种大规模深度神经网络。为了重现性,我们在GitHub上发布了我们的工具:url{ https://github.com/TrustAI/PRoA}。
{"title":"PRoA: A Probabilistic Robustness Assessment against Functional Perturbations","authors":"Tianle Zhang, Wenjie Ruan, J. Fieldsend","doi":"10.48550/arXiv.2207.02036","DOIUrl":"https://doi.org/10.48550/arXiv.2207.02036","url":null,"abstract":"In safety-critical deep learning applications robustness measurement is a vital pre-deployment phase. However, existing robustness verification methods are not sufficiently practical for deploying machine learning systems in the real world. On the one hand, these methods attempt to claim that no perturbations can ``fool'' deep neural networks (DNNs), which may be too stringent in practice. On the other hand, existing works rigorously consider $L_p$ bounded additive perturbations on the pixel space, although perturbations, such as colour shifting and geometric transformations, are more practically and frequently occurring in the real world. Thus, from the practical standpoint, we present a novel and general {it probabilistic robustness assessment method} (PRoA) based on the adaptive concentration, and it can measure the robustness of deep learning models against functional perturbations. PRoA can provide statistical guarantees on the probabilistic robustness of a model, textit{i.e.}, the probability of failure encountered by the trained model after deployment. Our experiments demonstrate the effectiveness and flexibility of PRoA in terms of evaluating the probabilistic robustness against a broad range of functional perturbations, and PRoA can scale well to various large-scale deep neural networks compared to existing state-of-the-art baselines. For the purpose of reproducibility, we release our tool on GitHub: url{ https://github.com/TrustAI/PRoA}.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82438440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1