Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning最新文献

英文中文

Training Deep Surrogate Models with Large Scale Online Learning 用大规模在线学习训练深度代理模型

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-28 DOI: 10.48550/arXiv.2306.16133

Lucas Meyer, M. Schouler, R. Caulk, Alejandro Rib'es, B. Raffin

The spatiotemporal resolution of Partial Differential Equations (PDEs) plays important roles in the mathematical description of the world's physical phenomena. In general, scientists and engineers solve PDEs numerically by the use of computationally demanding solvers. Recently, deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models. The framework implements several levels of parallelism focused on simultaneously generating numerical simulations and training deep neural networks. This approach suppresses the I/O and storage bottleneck associated with disk-loaded datasets, and opens the way to training on significantly larger datasets. Experiments compare the offline and online training of four surrogate models, including state-of-the-art architectures. Results indicate that exposing deep surrogate models to more dataset diversity, up to hundreds of GB, can increase model generalization capabilities. Fully connected neural networks, Fourier Neural Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved by 68%, 16% and 7%, respectively.

偏微分方程(PDEs)的时空分辨率在对世界物理现象的数学描述中起着重要作用。一般来说，科学家和工程师通过使用计算要求很高的求解器对偏微分方程进行数值求解。最近，深度学习算法已经成为快速求解偏微分方程的可行替代方案。模型通常在求解器生成的合成数据上进行训练，这些数据存储在磁盘上并回读用于训练。本文主张，依靠传统的静态数据集来训练这些模型，并不能充分发挥求解器作为数据生成器的优势。提出了一个开源的深度代理模型在线训练框架。该框架实现了几个级别的并行性，重点是同时生成数值模拟和训练深度神经网络。这种方法抑制了与磁盘负载数据集相关的I/O和存储瓶颈，并为在更大的数据集上进行训练开辟了道路。实验比较了四种代理模型的离线和在线训练，包括最先进的架构。结果表明，将深度代理模型暴露于更多的数据集多样性(高达数百GB)可以提高模型的泛化能力。全连接神经网络、傅立叶神经算子(Fourier neural Operator, FNO)和消息传递PDE求解器的预测精度分别提高了68%、16%和7%。

{"title":"Training Deep Surrogate Models with Large Scale Online Learning","authors":"Lucas Meyer, M. Schouler, R. Caulk, Alejandro Rib'es, B. Raffin","doi":"10.48550/arXiv.2306.16133","DOIUrl":"https://doi.org/10.48550/arXiv.2306.16133","url":null,"abstract":"The spatiotemporal resolution of Partial Differential Equations (PDEs) plays important roles in the mathematical description of the world's physical phenomena. In general, scientists and engineers solve PDEs numerically by the use of computationally demanding solvers. Recently, deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models. The framework implements several levels of parallelism focused on simultaneously generating numerical simulations and training deep neural networks. This approach suppresses the I/O and storage bottleneck associated with disk-loaded datasets, and opens the way to training on significantly larger datasets. Experiments compare the offline and online training of four surrogate models, including state-of-the-art architectures. Results indicate that exposing deep surrogate models to more dataset diversity, up to hundreds of GB, can increase model generalization capabilities. Fully connected neural networks, Fourier Neural Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved by 68%, 16% and 7%, respectively.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"14 24 1","pages":"24614-24630"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76658568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Adaptive Annealed Importance Sampling with Constant Rate Progress 恒速率进展的自适应退火重要性抽样

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15283

Shirin Goshtasbpour, Victor Cohen, F. Pérez-Cruz

Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. Following this observation, we derive the constant rate discretization schedule for this annealing sequence, which adjusts the schedule to the difficulty of moving samples between the initial and the target distributions. We further extend our results to $f$-divergences and present the respective dynamics of annealing sequences based on which we propose the Constant Rate AIS (CR-AIS) algorithm and its efficient implementation for $alpha$-divergences. We empirically show that CR-AIS performs well on multiple benchmark distributions while avoiding the computationally expensive tuning loop in existing Adaptive AIS.

退火重要性抽样(AIS)从给定非标准化密度函数的难处理分布中合成加权样本。该算法依赖于一系列内插分布，将目标连接到初始可处理分布，例如众所周知的非规范化分布的几何平均路径，通常被认为是次优的。在本文中，我们证明了当粒子分布的可行变化受到约束时，几何退火对应于使当前粒子分布与期望目标之间的KL散度最小的分布路径。根据这一观察结果，我们导出了该退火序列的恒速率离散化计划，该计划调整了在初始分布和目标分布之间移动样本的难度。我们进一步将我们的结果扩展到$f$-散度，并给出了退火序列的各自动力学，在此基础上我们提出了恒定速率AIS (CR-AIS)算法及其对$ α $-散度的有效实现。我们的经验表明，CR-AIS在多个基准分布上表现良好，同时避免了现有自适应AIS中计算代价高昂的调优循环。

{"title":"Adaptive Annealed Importance Sampling with Constant Rate Progress","authors":"Shirin Goshtasbpour, Victor Cohen, F. Pérez-Cruz","doi":"10.48550/arXiv.2306.15283","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15283","url":null,"abstract":"Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. Following this observation, we derive the constant rate discretization schedule for this annealing sequence, which adjusts the schedule to the difficulty of moving samples between the initial and the target distributions. We further extend our results to $f$-divergences and present the respective dynamics of annealing sequences based on which we propose the Constant Rate AIS (CR-AIS) algorithm and its efficient implementation for $alpha$-divergences. We empirically show that CR-AIS performs well on multiple benchmark distributions while avoiding the computationally expensive tuning loop in existing Adaptive AIS.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"22 1","pages":"11642-11658"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82730826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

High Fidelity Image Counterfactuals with Probabilistic Causal Models 基于概率因果模型的高保真图像反事实

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15764

Fabio De Sousa Ribeiro, Tian Xia, M. Monteiro, Nick Pawlowski, B. Glocker

We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models. Estimation of interventional and counterfactual queries for high-dimensional structured variables, such as images, remains a challenging task. We leverage ideas from causal mediation analysis and advances in generative modelling to design new deep causal mechanisms for structured variables in causal models. Our experiments demonstrate that our proposed mechanisms are capable of accurate abduction and estimation of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals.

我们提出了一个通用的因果生成建模框架，用于准确估计具有深层结构因果模型的高保真图像反事实。对高维结构化变量(如图像)的介入和反事实查询的估计仍然是一项具有挑战性的任务。我们利用因果中介分析的思想和生成模型的进展，为因果模型中的结构化变量设计新的深层因果机制。我们的实验表明，我们提出的机制能够准确地溯因和估计直接，间接和总影响，通过反事实的公理合理性来衡量。

引用次数: 6

Semi Bandit Dynamics in Congestion Games: Convergence to Nash Equilibrium and No-Regret Guarantees 拥塞博弈中的半强盗动力学:收敛到纳什均衡和无遗憾保证

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15543

Ioannis Panageas, Stratis Skoulakis, Luca Viano, Xiao Wang, V. Cevher

In this work, we introduce a new variant of online gradient descent, which provably converges to Nash Equilibria and simultaneously attains sublinear regret for the class of congestion games in the semi-bandit feedback setting. Our proposed method admits convergence rates depending only polynomially on the number of players and the number of facilities, but not on the size of the action set, which can be exponentially large in terms of the number of facilities. Moreover, the running time of our method has polynomial-time dependence on the implicit description of the game. As a result, our work answers an open question from (Du et. al, 2022).

在本文中，我们引入了一种新的在线梯度下降的变体，证明了它收敛于纳什均衡，同时在半强盗反馈设置下实现了一类拥堵博弈的次线性遗憾。我们提出的方法允许收敛速度仅多项式地取决于参与者的数量和设施的数量，但不取决于行动集的大小，就设施的数量而言，行动集可以呈指数级增长。此外，该方法的运行时间与博弈的隐式描述具有多项式时间依赖性。因此，我们的工作回答了(Du et. al, 2022)的一个开放性问题。

引用次数: 1

FAIRER: Fairness as Decision Rationale Alignment 更公平:公平作为决策基本原理的一致性

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15299

Tianlin Li, Qing-Wu Guo, Aishan Liu, Mengnan Du, Zhiming Li, Yang Liu

Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can significantly enhance fairness while sustaining a high level of accuracy and outperforming other approaches by a wide margin.

深度神经网络(dnn)已经取得了重大进展，但经常受到公平性问题的困扰，因为深度模型通常在某些子组(例如，男性和女性)之间显示出明显的准确性差异。现有研究通过使用公平感知损失函数来约束最后一层输出并直接正则化dnn来解决这一关键问题。尽管dnn的公平性得到了提高，但尚不清楚训练后的网络如何做出公平的预测，这限制了未来公平性的提高。本文从决策原理的角度研究公平性，并通过分析神经元在各个子群中的影响，定义了参数奇偶得分来表征网络的公平决策过程。大量的实证研究表明，不公平问题可能产生于子群体的不一致的决策基础。现有的公平性正则化项由于只约束最后一层的输出而忽略了中间神经元的对齐而无法实现决策基本原理对齐。为了解决这个问题，我们将公平性制定为一个新的任务，即决策基本原理对齐，要求dnn的神经元在中间过程和最终预测中对子组有一致的响应。为了在优化过程中实现这一想法，我们放宽了朴素目标函数并提出了梯度引导宇称对齐，这鼓励了神经元在子组之间的梯度加权一致性。在各种数据集上进行的大量实验表明，我们的方法可以显著提高公平性，同时保持高水平的准确性，并大大优于其他方法。

{"title":"FAIRER: Fairness as Decision Rationale Alignment","authors":"Tianlin Li, Qing-Wu Guo, Aishan Liu, Mengnan Du, Zhiming Li, Yang Liu","doi":"10.48550/arXiv.2306.15299","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15299","url":null,"abstract":"Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can significantly enhance fairness while sustaining a high level of accuracy and outperforming other approaches by a wide margin.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"142 1","pages":"19471-19489"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73802658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LongCoder: A Long-Range Pre-trained Language Model for Code Completion LongCoder:用于代码完成的远程预训练语言模型

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-26 DOI: 10.48550/arXiv.2306.14893

Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian McAuley

In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens - bridge tokens and memory tokens - to improve performance and efficiency. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction, while memory tokens are included to highlight important statements that may be invoked later and need to be memorized, such as package imports and definitions of classes, functions, or structures. We conduct experiments on a newly constructed dataset that contains longer code context and the publicly available CodeXGLUE benchmark. Experimental results demonstrate that LongCoder achieves superior performance on code completion tasks compared to previous models while maintaining comparable efficiency in terms of computational resources during inference. All the codes and data are available at https://github.com/microsoft/CodeBERT.

在本文中，我们介绍了一个用于代码完成的新任务，该任务专注于处理长代码输入，并提出了一个称为LongCoder的稀疏Transformer模型来处理该任务。LongCoder采用滑动窗口机制进行自关注，并引入了两种全局可访问的令牌——桥接令牌和内存令牌——以提高性能和效率。在整个输入序列中插入桥接令牌以聚合本地信息并促进全局交互，而包含内存令牌以突出显示以后可能调用且需要记忆的重要语句，例如包导入和类、函数或结构的定义。我们在一个新构建的数据集上进行实验，该数据集包含更长的代码上下文和公开可用的CodeXGLUE基准。实验结果表明，与以前的模型相比，LongCoder在代码完成任务上取得了更好的性能，同时在推理过程中保持了相当的计算资源效率。所有代码和数据可在https://github.com/microsoft/CodeBERT上获得。

{"title":"LongCoder: A Long-Range Pre-trained Language Model for Code Completion","authors":"Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian McAuley","doi":"10.48550/arXiv.2306.14893","DOIUrl":"https://doi.org/10.48550/arXiv.2306.14893","url":null,"abstract":"In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens - bridge tokens and memory tokens - to improve performance and efficiency. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction, while memory tokens are included to highlight important statements that may be invoked later and need to be memorized, such as package imports and definitions of classes, functions, or structures. We conduct experiments on a newly constructed dataset that contains longer code context and the publicly available CodeXGLUE benchmark. Experimental results demonstrate that LongCoder achieves superior performance on code completion tasks compared to previous models while maintaining comparable efficiency in terms of computational resources during inference. All the codes and data are available at https://github.com/microsoft/CodeBERT.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"13 1","pages":"12098-12107"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82567730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards Trustworthy Explanation: On Causal Rationalization 走向可信解释:论因果理性化

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-25 DOI: 10.48550/arXiv.2306.14115

Wenbo Zhang, Tong Wu, Yunlong Wang, Yong Cai, Hengrui Cai

With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.

随着自然语言处理的最新进展，通过选择输入文本的子集来解释预测中的主要变化，合理化成为一个重要的自解释图，以解开黑盒子。然而，当两个或多个片段高度相互关联时，现有的基于关联的合理化方法无法识别真正的基本原理，从而对预测准确性提供类似的贡献，即所谓的虚假。为了解决这一限制，我们从因果推理的角度新颖地利用了两个因果期望，即非虚假和效率，来进行合理化。我们在新提出的合理化结构因果模型的基础上正式定义了一系列因果概率，并将其理论识别作为学习必要和充分理由的主要组成部分。与最先进的方法相比，所提出的因果合理化的优越性能在现实世界的审查和医学数据集上得到了广泛的实验证明。

引用次数: 6

Computational Asymmetries in Robust Classification 鲁棒分类中的计算不对称性

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-25 DOI: 10.48550/arXiv.2306.14326

Samuele Marro, M. Lombardi

In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $mathit{NP}$-hard, ensuring their robustness at training time is $Sigma^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $mathit{NP}$-hard, while attacking it is $Sigma_2^P$-hard. Finally, motivated by our previous result, we argue that adversarial attacks can be used in the context of robustness certification, and provide an empirical evaluation of their effectiveness. As a byproduct of this process, we also release UG100, a benchmark dataset for adversarial attacks.

在对抗性鲁棒性的背景下，我们做出了三个强烈相关的贡献。首先，我们证明了虽然攻击ReLU分类器是$mathit{NP}$-hard，但确保它们在训练时的鲁棒性是$Sigma^2_P$-hard(即使在单个示例上)。这种不对称性为健壮的分类方法在文献中经常被愚弄提供了一个基本原理。其次，我们通过引入一种名为反击(CA)的概念验证方法，证明了推理时间鲁棒性证书不受这种不对称性的影响。事实上，CA显示了一种反向的不对称:运行防御是$mathit{NP}$-困难的，而攻击它是$Sigma_2^P$-困难的。最后，根据我们之前的结果，我们认为对抗性攻击可以在鲁棒性认证的背景下使用，并提供其有效性的经验评估。作为这个过程的副产品，我们还发布了UG100，这是一个针对对抗性攻击的基准数据集。

引用次数: 0

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning 掌握ASR:用模块化学习实现ASR的多语言可扩展性和低资源适应

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-23 DOI: 10.48550/arXiv.2306.15686

Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Y. Fu, Y. Lin

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed METHODNS, that, textit{for the first time}, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, METHOD learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that METHOD can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13$sim$2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER, with nearly 50 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.

尽管自动语音识别(ASR)最近取得了令人印象深刻的性能，但我们观察到阻碍其更广泛应用的两个主要挑战:(1)在有限的训练、推理和存储开销的情况下，难以将可扩展性引入模型以支持更多语言;(2)低资源适应能力，在避免过度拟合和灾难性遗忘问题的同时，实现有效的低资源适应。受最近发现的启发，我们假设可以使用跨语言广泛共享的模块来解决上述挑战。为此，我们提出了一个名为METHODNS的ASR框架，该框架textit{首次}实现了强大的多语言可扩展性和低资源适应能力，这得益于其模块化后组装策略。具体而言，METHOD学习了一组可泛化的子模块，并针对不同的语言自适应地组装它们，以减少多语言开销，并为低资源适应实现有效的知识转移。大量的实验和可视化表明，METHOD可以有效地发现语言相似性，并比最先进的(SOTA)方法提高多语言和低资源的ASR性能，例如，在多语言ASR下，我们的框架实现了0.13 $sim$ 2.41低字符错误率(CER)，比SOTA解决方案在多语言ASR和类似的CER上减少了30％的推理开销。在低资源调优方面，它们的可训练参数分别比SOTA解决方案少近50倍。

{"title":"Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning","authors":"Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Y. Fu, Y. Lin","doi":"10.48550/arXiv.2306.15686","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15686","url":null,"abstract":"Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed METHODNS, that, textit{for the first time}, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, METHOD learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that METHOD can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13$sim$2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER, with nearly 50 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"35 1","pages":"40475-40487"},"PeriodicalIF":0.0,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84263213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximate Causal Effect Identification under Weak Confounding 弱混杂下的近似因果效应识别

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-06-22 DOI: 10.48550/arXiv.2306.13242

Ziwei Jiang, Lai Wei, M. Kocaoglu

Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of"weak confounding"on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.

在只有观测数据的情况下，许多研究人员对因果效应估计进行了研究。健全和完整的算法已经开发用于点估计可识别的因果查询。对于不可识别的因果查询，研究人员开发了多项式程序来估计因果效应的紧界。然而，对于具有大支持大小的变量，这些在计算上难以优化。在本文中，我们分析了“弱混淆”对因果估计的影响。更具体地说，假设导致查询无法识别的未观察到的混杂因素具有小熵，我们提出了一个有效的线性程序来推导因果效应的上界和下界。我们证明了我们的界限是一致的，因为当未观察到的混杂因素的熵趋于零时，上界和下界之间的差距消失了。最后，我们进行了综合和真实的数据模拟，将我们的边界与现有工作中不包含此类熵约束的边界进行了比较，并表明我们的边界对于具有弱混杂因素的设置更为严格。

{"title":"Approximate Causal Effect Identification under Weak Confounding","authors":"Ziwei Jiang, Lai Wei, M. Kocaoglu","doi":"10.48550/arXiv.2306.13242","DOIUrl":"https://doi.org/10.48550/arXiv.2306.13242","url":null,"abstract":"Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of\"weak confounding\"on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"7 1","pages":"15125-15143"},"PeriodicalIF":0.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84339774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀