首页 > 最新文献

Transactions on machine learning research最新文献

英文 中文
Knockout: A simple way to handle missing inputs. Knockout:处理缺失输入的简单方法。
Pub Date : 2025-07-01 Epub Date: 2025-07-19
Minh Nguyen, Batuhan K Karaman, Heejong Kim, Alan Q Wang, Fengbei Liu, Mert R Sabuncu

Deep learning models benefit from rich (e.g., multi-modal) input features. However, multimodal models might be challenging to deploy, because some inputs may be missing at inference. Current popular solutions include marginalization, imputation, and training multiple models. Marginalization achieves calibrated predictions, but it is computationally expensive and only feasible for low dimensional inputs. Imputation may result in inaccurate predictions, particularly when high-dimensional data, such as images, are missing. Training multiple models, where each model is designed to handle different subsets of inputs, can work well but requires prior knowledge of missing input patterns. Furthermore, training and retaining multiple models can be costly. We propose an efficient method to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification for Knockout and show that it can be interpreted as an implicit marginalization strategy. We evaluate Knockout across a wide range of simulations and real-world datasets and show that it offers strong empirical performance.

深度学习模型受益于丰富的(例如,多模态)输入特征。然而,部署多模态模型可能具有挑战性,因为在推理时可能缺少一些输入。目前流行的解决方案包括边缘化、imputation和训练多个模型。边缘化实现了校准的预测,但它在计算上是昂贵的,并且只适用于低维输入。代入可能导致不准确的预测,特别是当缺少高维数据(如图像)时。训练多个模型,其中每个模型被设计用于处理不同的输入子集,可以很好地工作,但需要事先了解缺失的输入模式。此外,训练和保留多个模型的成本很高。我们提出了一种学习全输入条件分布和边际分布的有效方法。我们的方法,Knockout,在训练期间用适当的占位符值随机替换输入特征。我们为淘汰赛提供了理论依据,并表明它可以被解释为一种隐性边缘化策略。我们通过广泛的模拟和现实世界数据集评估了Knockout,并表明它提供了强大的经验性能。
{"title":"Knockout: A simple way to handle missing inputs.","authors":"Minh Nguyen, Batuhan K Karaman, Heejong Kim, Alan Q Wang, Fengbei Liu, Mert R Sabuncu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Deep learning models benefit from rich (e.g., multi-modal) input features. However, multimodal models might be challenging to deploy, because some inputs may be missing at inference. Current popular solutions include marginalization, imputation, and training multiple models. Marginalization achieves calibrated predictions, but it is computationally expensive and only feasible for low dimensional inputs. Imputation may result in inaccurate predictions, particularly when high-dimensional data, such as images, are missing. Training multiple models, where each model is designed to handle different subsets of inputs, can work well but requires prior knowledge of missing input patterns. Furthermore, training and retaining multiple models can be costly. We propose an efficient method to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification for Knockout and show that it can be interpreted as an implicit marginalization strategy. We evaluate Knockout across a wide range of simulations and real-world datasets and show that it offers strong empirical performance.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809338/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining. TapWeight:任务自适应预训练的重新加权预训练目标。
Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie

Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language processing tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight. Our code is available at https://github.com/ruz048/TapWeight.

大规模的通用域预训练,然后是下游特定的微调,已经成为机器学习的主要范例。然而,在某些情况下,预训练和目标域之间的差异仍然会导致性能下降,这强调了任务自适应持续预训练(TAP)的必要性。TAP方法通常包括对特定任务的未标记数据集进行持续预训练,或引入额外的无监督学习目标来增强模型能力。虽然许多TAP方法使用多个预训练目标执行持续预训练,但它们通常手动确定目标之间的权衡参数,导致次优结果和更高的计算成本。本文提出了一种任务自适应预训练框架TapWeight,该框架基于下游反馈自动确定每个预训练目标的最优重要性。TapWeight通过求解一个多级优化问题来重新加权每个预训练目标。我们将TapWeight应用于分子性质预测和自然语言处理任务,显著优于基线方法。实验结果验证了TapWeight算法的有效性和通用性。我们的代码可在https://github.com/ruz048/TapWeight上获得。
{"title":"TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining.","authors":"Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language processing tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight. Our code is available at https://github.com/ruz048/TapWeight.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization. 基于多级优化的掩蔽自编码器的下游任务引导掩蔽学习。
Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K Gupta, Pengtao Xie

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency. Our code is available at https://github.com/Alexiland/MLO-MAE.

掩码自编码器(mask Autoencoder, MAE)是视觉表征学习中一种重要的自监督预训练方法。它的工作原理是随机屏蔽图像补丁,并使用未屏蔽的补丁重建这些被屏蔽的补丁。MAE的一个关键限制在于它忽略了不同补丁的不同信息量,因为它统一地选择要掩码的补丁。为了克服这个问题,一些方法提出了基于补丁信息的掩蔽。然而,这些方法通常不考虑下游任务的特定需求,可能导致这些任务的次优表示。作为回应,我们引入了多层次优化掩码自编码器(MLO-MAE),这是一种利用下游任务的端到端反馈在预训练期间学习最优掩码策略的新框架。我们的实验结果突出了MLO-MAE在视觉表征学习方面的显著进步。与现有方法相比,它在不同的数据集和任务上表现出显著的改进,显示了它的适应性和效率。我们的代码可在https://github.com/Alexiland/MLO-MAE上获得。
{"title":"Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization.","authors":"Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K Gupta, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency. Our code is available at https://github.com/Alexiland/MLO-MAE.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics. 通过建模神经训练动力学加速学习图像压缩。
Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.

随着学习图像压缩(LIC)方法对计算量的要求越来越高,提高其训练效率至关重要。本文通过对神经训练动力学模型的建模,在加速LIC方法的训练方面迈出了一步。我们首先提出了一种灵敏度感知的真假嵌入训练机制(STDET),该机制将LIC模型参数聚类到几个独立的模式中,其中参数表示为同一模式内参考参数的仿射变换。通过进一步利用训练过程中稳定的模内相关性和参数敏感性,我们逐步嵌入非参考参数,减少可训练参数的数量。此外,我们结合了采样-移动平均(SMA)技术,从随机梯度下降(SGD)训练中插值采样权重以获得移动平均权重,确保平滑的时间行为并最小化训练状态方差。总的来说,我们的方法在不牺牲模型性能的情况下显著降低了训练空间维度和可训练参数的数量,从而加速了模型的收敛。我们还对有噪声的二次模型进行了理论分析,表明该方法比标准SGD方法获得了更低的训练方差。我们的方法为进一步为低收入国家开发有效的培训方法提供了宝贵的见解。
{"title":"Accelerating Learned Image Compression Through Modeling Neural Training Dynamics.","authors":"Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12129407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the stability of gradient descent with second order dynamics for time-varying cost functions. 时变代价函数二阶动力学梯度下降的稳定性。
Travis E Gibson, Sawal Acharya, Anjali Parashar, Joseph E Gaudio, Anuradha M Annaswamy

Gradient based optimization algorithms deployed in Machine Learning (ML) applications are often analyzed and compared by their convergence rates or regret bounds. While these rates and bounds convey valuable information they don't always directly translate to stability guarantees. Stability and similar concepts, like robustness, will become ever more important as we move towards deploying models in real-time and safety critical systems. In this work we build upon the results in Gaudio et al. 2021 and Moreu & Annaswamy 2022 for gradient descent with second order dynamics when applied to explicitly time varying cost functions and provide more general stability guarantees. These more general results can aid in the design and certification of these optimization schemes so as to help ensure safe and reliable deployment for real-time learning applications. We also hope that the techniques provided here will stimulate and cross-fertilize the analysis that occurs on the same algorithms from the online learning and stochastic optimization communities.

在机器学习(ML)应用中部署的基于梯度的优化算法通常通过其收敛速度或遗憾界限进行分析和比较。虽然这些利率和上限传达了有价值的信息,但它们并不总是直接转化为稳定的保证。随着我们在实时和安全关键系统中部署模型,稳定性和类似的概念(如鲁棒性)将变得越来越重要。在这项工作中,我们以Gaudio等人2021年和Moreu & Annaswamy 2022年的结果为基础,研究了当应用于显式时变成本函数时,二阶动力学梯度下降的结果,并提供更一般的稳定性保证。这些更通用的结果可以帮助这些优化方案的设计和认证,从而帮助确保实时学习应用程序的安全可靠部署。我们也希望这里提供的技术能够刺激和促进来自在线学习和随机优化社区的相同算法的分析。
{"title":"On the stability of gradient descent with second order dynamics for time-varying cost functions.","authors":"Travis E Gibson, Sawal Acharya, Anjali Parashar, Joseph E Gaudio, Anuradha M Annaswamy","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Gradient based optimization algorithms deployed in Machine Learning (ML) applications are often analyzed and compared by their convergence rates or regret bounds. While these rates and bounds convey valuable information they don't always directly translate to stability guarantees. Stability and similar concepts, like robustness, will become ever more important as we move towards deploying models in real-time and safety critical systems. In this work we build upon the results in Gaudio et al. 2021 and Moreu & Annaswamy 2022 for gradient descent with second order dynamics when applied to explicitly time varying cost functions and provide more general stability guarantees. These more general results can aid in the design and certification of these optimization schemes so as to help ensure safe and reliable deployment for real-time learning applications. We also hope that the techniques provided here will stimulate and cross-fertilize the analysis that occurs on the same algorithms from the online learning and stochastic optimization communities.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12284918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer Architecture Search for Improving Out-of-Domain Generalization in Machine Translation. 改进机器翻译领域外泛化的变压器体系结构搜索。
Yiheng He, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie

Interest in automatically searching for Transformer neural architectures for machine translation (MT) has been increasing. Current methods show promising results in in-domain settings, where training and test data share the same distribution. However, in real-world MT applications, it is common that the test data has a different distribution than the training data. In these out-of-domain (OOD) situations, Transformer architectures optimized for the linguistic characteristics of the training sentences struggle to produce accurate translations for OOD sentences during testing. To tackle this issue, we propose a multi-level optimization based method to automatically search for neural architectures that possess robust OOD generalization capabilities. During the architecture search process, our method automatically synthesizes approximated OOD MT data, which is used to evaluate and improve the architectures' ability of generalizing to OOD scenarios. The generation of approximated OOD data and the search for optimal architectures are executed in an integrated, end-to-end manner. Evaluated across multiple datasets, our method demonstrates strong OOD generalization performance, surpassing state-of-the-art approaches. Our code is publicly available at https://github.com/yihenghe/transformer_nas.

对自动搜索用于机器翻译(MT)的Transformer神经结构的兴趣越来越大。目前的方法在域内设置中显示出有希望的结果,其中训练和测试数据共享相同的分布。然而,在现实世界的机器翻译应用中,测试数据的分布与训练数据的分布不同是很常见的。在这些域外(OOD)的情况下,针对训练句子的语言特征进行优化的Transformer架构在测试期间难以为OOD句子生成准确的翻译。为了解决这个问题,我们提出了一种基于多级优化的方法来自动搜索具有鲁棒OOD泛化能力的神经架构。在架构搜索过程中,我们的方法自动合成近似的OOD MT数据,用于评估和提高架构泛化到OOD场景的能力。近似OOD数据的生成和最佳架构的搜索以集成的端到端方式执行。通过对多个数据集的评估,我们的方法显示出强大的OOD泛化性能,超过了最先进的方法。我们的代码可以在https://github.com/yihenghe/transformer_nas上公开获得。
{"title":"Transformer Architecture Search for Improving Out-of-Domain Generalization in Machine Translation.","authors":"Yiheng He, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Interest in automatically searching for Transformer neural architectures for machine translation (MT) has been increasing. Current methods show promising results in in-domain settings, where training and test data share the same distribution. However, in real-world MT applications, it is common that the test data has a different distribution than the training data. In these out-of-domain (OOD) situations, Transformer architectures optimized for the linguistic characteristics of the training sentences struggle to produce accurate translations for OOD sentences during testing. To tackle this issue, we propose a multi-level optimization based method to automatically search for neural architectures that possess robust OOD generalization capabilities. During the architecture search process, our method automatically synthesizes approximated OOD MT data, which is used to evaluate and improve the architectures' ability of generalizing to OOD scenarios. The generation of approximated OOD data and the search for optimal architectures are executed in an integrated, end-to-end manner. Evaluated across multiple datasets, our method demonstrates strong OOD generalization performance, surpassing state-of-the-art approaches. Our code is publicly available at https://github.com/yihenghe/transformer_nas.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective Classification Under Distribution Shifts. 分布移位下的选择性分类。
Hengyue Liang, Le Peng, Ju Sun

In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers-either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond-in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers, and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. The code is available at https://github.com/sun-umn/sc_with_distshift.

在选择性分类(SC)中,分类器避免做出可能错误的预测,以避免过多的错误。为了部署不完美的分类器——要么是由于数据的固有统计噪声,要么是由于分类器的鲁棒性问题,或者在高风险的情况下,SC似乎是一个有吸引力和必要的路径。尽管对SC进行了数十年的研究,但大多数先前的SC方法仍然只关注理想的统计设置,即部署时的数据分布与训练时的数据分布相同,尽管实际数据可能来自野外。为了弥补这一差距,在本文中,我们提出了一个SC框架,该框架考虑了分布移位,称为广义选择分类,除了典型的分布内样本外,还涵盖了标签移位(或分布外)和协变量移位样本,这是SC文献中的第一个此类样本。研究了深度学习分类器上广义SC的非基于训练的置信度分数函数,提出了两个新的基于边缘的置信度分数函数。通过广泛的分析和实验,我们证明了我们提出的分数函数在各种分类任务和深度学习分类器上比现有的广义SC分数函数更有效和可靠。代码可在https://github.com/sun-umn/sc_with_distshift上获得。
{"title":"Selective Classification Under Distribution Shifts.","authors":"Hengyue Liang, Le Peng, Ju Sun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers-either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond-in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed <i>generalized selective classification</i>, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, <i>the first of its kind</i> in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers, and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. The code is available at https://github.com/sun-umn/sc_with_distshift.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning. MoMA:离线强化学习的基于模型的镜像提升。
Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu

Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without developing practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees for MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies.

基于模型的离线强化学习方法(RL)由于其样本效率和可泛化性,在许多决策问题中取得了最先进的性能。尽管取得了这些进步,但现有的基于模型的离线强化学习方法要么专注于理论研究,而没有开发实际算法,要么依赖于有限的参数策略空间,因此没有充分利用基于模型的方法固有的无限制策略空间的优势。为了解决这一限制,我们开发了MoMA,这是一种基于模型的镜像上升算法,在离线数据的部分覆盖下具有一般函数近似。MoMA与现有文献的区别在于,它采用了不受限制的政策类。在每次迭代中,MoMA在策略评估步骤中通过在过渡模型置信度集内的最小化过程保守估计值函数,然后在策略改进步骤中使用一般函数近似代替常用的参数策略类来更新策略。在一些温和的假设下,我们通过证明返回策略次优性的上界来建立MoMA的理论保证。我们还提供了该算法的一个实际可实现的近似版本。通过数值研究证明了MoMA的有效性。
{"title":"MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning.","authors":"Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without developing practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees for MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12742664/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers. ModuLoRA:通过集成模块化量化器对消费级gpu上的2位llm进行微调。
Pub Date : 2024-02-01 Epub Date: 2024-02-27
Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov

We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 2-bit and 3-bit LLMs for the first time-leveraging state-of-the-art 2-bit QuIP# quantization and 3-bit OPTQ quantization-outperforming finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language inference, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models as part of LLMTools, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.

我们提出了一种用于大型语言模型(llm)的内存高效微调算法,该算法支持在一个24GB GPU上以2/3/4位精度对65B参数的llm进行微调。我们的方法,模块化低秩自适应(ModuLoRA),集成了任何用户指定的权重量化器,并通过低秩适配器(lora)进行微调。我们的方法依赖于一个简单的量化不可知的反向传递,该传递自适应地实现来自自定义黑盒量化模块的低精度LLM权重。这种方法首次实现了2位和3位llm的微调——利用最先进的2位quip#量化和3位OPTQ量化——优于依赖于不太复杂的4位和8位方法的微调。在我们的实验中,ModuLoRA在文本分类、自然语言推理和指令跟踪任务上取得了具有竞争力的性能,使用的内存比现有方法少得多,而且在一个流行的摘要任务上,我们也超过了最先进的ROUGE分数。我们发布了ModuLoRA和一系列低精度模型,作为LLMTools的一部分,LLMTools是一个用户友好的库,用于在消费级gpu上量化、运行和微调llm。
{"title":"ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers.","authors":"Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 2-bit and 3-bit LLMs for the first time-leveraging state-of-the-art 2-bit QuIP# quantization and 3-bit OPTQ quantization-outperforming finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language inference, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models as part of LLMTools, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning with Convex Global and Local Constraints. 全局和局部约束条件下的联合学习
Pub Date : 2024-01-01 Epub Date: 2024-05-03
Chuan He, Le Peng, Ju Sun

In practice, many machine learning (ML) problems come with constraints, and their applied domains involve distributed sensitive data that cannot be shared with others, e.g., in healthcare. Collaborative learning in such practical scenarios entails federated learning (FL) for ML problems with constraints, or FL with constraints for short. Despite the extensive developments of FL techniques in recent years, these techniques only deal with unconstrained FL problems or FL problems with simple constraints that are amenable to easy projections. There is little work dealing with FL problems with general constraints. To fill this gap, we take the first step toward building an algorithmic framework for solving FL problems with general constraints. In particular, we propose a new FL algorithm for constrained ML problems based on the proximal augmented Lagrangian (AL) method. Assuming convex objective and convex constraints plus other mild conditions, we establish the worst-case complexity of the proposed algorithm. Our numerical experiments show the effectiveness of our algorithm in performing Neyman-Pearson classification and fairness-aware learning with nonconvex constraints, in an FL setting.

在实践中,许多机器学习(ML)问题都带有约束条件,其应用领域涉及不能与他人共享的分布式敏感数据,例如在医疗保健领域。在这种实际场景中进行协作学习,需要针对有约束条件的 ML 问题进行联合学习(FL),或简称为有约束条件的联合学习。尽管近年来联合学习技术得到了广泛的发展,但这些技术只能处理无约束联合学习问题或具有简单约束条件的联合学习问题,这些约束条件易于预测。处理一般约束条件下的 FL 问题的工作很少。为了填补这一空白,我们迈出了第一步,为解决具有一般约束条件的 FL 问题建立了算法框架。特别是,我们提出了一种基于近似增强拉格朗日(AL)方法的新 FL 算法。假设凸目标和凸约束以及其他温和条件,我们建立了所提算法的最坏情况复杂度。我们的数值实验表明,我们的算法在 FL 环境下执行 Neyman-Pearson 分类和具有非凸约束的公平感知学习时非常有效。
{"title":"Federated Learning with Convex Global and Local Constraints.","authors":"Chuan He, Le Peng, Ju Sun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In practice, many machine learning (ML) problems come with constraints, and their applied domains involve distributed sensitive data that cannot be shared with others, e.g., in healthcare. Collaborative learning in such practical scenarios entails federated learning (FL) for ML problems with constraints, or <i>FL with constraints</i> for short. Despite the extensive developments of FL techniques in recent years, these techniques only deal with unconstrained FL problems or FL problems with simple constraints that are amenable to easy projections. There is little work dealing with FL problems with general constraints. To fill this gap, we take the first step toward building an algorithmic framework for solving FL problems with general constraints. In particular, we propose a new FL algorithm for constrained ML problems based on the proximal augmented Lagrangian (AL) method. Assuming convex objective and convex constraints plus other mild conditions, we establish the worst-case complexity of the proposed algorithm. Our numerical experiments show the effectiveness of our algorithm in performing Neyman-Pearson classification and fairness-aware learning with nonconvex constraints, in an FL setting.</p>","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11295925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Transactions on machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1