Transactions on machine learning research最新文献

英文中文

Online model selection by learning how compositional kernels evolve. 通过学习组成核的演变过程进行在线模型选择。

Transactions on machine learning research

Pub Date : 2023-11-01

Eura Shin, Predrag Klasnja, Susan A Murphy, Finale Doshi-Velez

Motivated by the need for efficient, personalized learning in mobile health, we investigate the problem of online compositional kernel selection for multi-task Gaussian Process regression. Existing composition selection methods do not satisfy our strict criteria in health; selection must occur quickly, and the selected kernels must maintain the appropriate level of complexity, sparsity, and stability as data arrives online. We introduce the Kernel Evolution Model (KEM), a generative process on how to evolve kernel compositions in a way that manages the bias-variance trade-off as we observe more data about a user. Using pilot data, we learn a set of kernel evolutions that can be used to quickly select kernels for new test users. KEM reliably selects high-performing kernels for a range of synthetic and real data sets, including two health data sets.

受移动医疗领域高效、个性化学习需求的驱动，我们研究了多任务高斯过程回归的在线组成核选择问题。现有的组合选择方法无法满足我们在健康领域的严格标准；选择必须快速进行，并且所选内核必须在数据在线到达时保持适当的复杂性、稀疏性和稳定性水平。我们引入了内核演化模型（KEM），它是一种生成过程，可以在观察到更多用户数据时，以管理偏差-方差权衡的方式演化内核组合。利用试验数据，我们学习了一组内核演化，可用于为新的测试用户快速选择内核。KEM 可以为一系列合成数据集和真实数据集（包括两个健康数据集）可靠地选择高性能内核。

引用次数: 0

Reliable Active Learning via Influence Functions. 通过影响函数进行可靠的主动学习。

Transactions on machine learning research

Pub Date : 2023-11-01

Meng Xia, Ricardo Henao

Due to the high cost and time-consuming nature of collecting labeled data, having insufficient labeled data is a common challenge that can negatively impact the performance of deep learning models when applied to real-world applications. Active learning (AL) aims to reduce the cost and time required for obtaining labeled data by selecting valuable samples during model training. However, recent works have pointed out the performance unreliability of existing AL algorithms for deep learning (DL) architectures under different scenarios, which manifests as their performance being comparable (or worse) to that of basic random selection. This behavior compromises the applicability of these approaches. We address this problem by proposing a theoretically motivated AL framework for DL architectures. We demonstrate that the most valuable samples for the model are those that, unsurprisingly, improve its performance on the entire dataset, most of which is unlabeled, and present a framework to efficiently estimate such performance (or loss) via influence functions, pseudo labels and diversity selection. Experimental results show that the proposed reliable active learning via influence functions (RALIF) can consistently outperform the random selection baseline as well as other existing and state-of-the art active learning approaches.

由于收集标记数据的高成本和耗时性质，标记数据不足是一个常见的挑战，当应用于实际应用时，可能会对深度学习模型的性能产生负面影响。主动学习（AL）旨在通过在模型训练过程中选择有价值的样本来减少获得标记数据所需的成本和时间。然而，最近的研究指出，现有的人工智能算法在不同场景下对深度学习（DL）架构的性能不可靠，表现为它们的性能与基本随机选择相当（甚至更差）。这种行为损害了这些方法的适用性。我们通过为DL架构提出一个理论上有动机的ai框架来解决这个问题。我们证明了模型最有价值的样本是那些毫不奇怪地提高其在整个数据集上的性能的样本，其中大部分数据集是未标记的，并提出了一个框架，通过影响函数、伪标签和多样性选择有效地估计这种性能（或损失）。实验结果表明，所提出的基于影响函数的可靠主动学习方法（RALIF）可以持续优于随机选择基线以及其他现有和最先进的主动学习方法。

{"title":"Reliable Active Learning via Influence Functions.","authors":"Meng Xia, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"Due to the high cost and time-consuming nature of collecting labeled data, having insufficient labeled data is a common challenge that can negatively impact the performance of deep learning models when applied to real-world applications. Active learning (AL) aims to reduce the cost and time required for obtaining labeled data by selecting valuable samples during model training. However, recent works have pointed out the performance unreliability of existing AL algorithms for deep learning (DL) architectures under different scenarios, which manifests as their performance being comparable (or worse) to that of basic random selection. This behavior compromises the applicability of these approaches. We address this problem by proposing a theoretically motivated AL framework for DL architectures. We demonstrate that the most valuable samples for the model are those that, unsurprisingly, improve its performance on the entire dataset, most of which is unlabeled, and present a framework to efficiently estimate such performance (or loss) via influence functions, pseudo labels and diversity selection. Experimental results show that the proposed reliable active learning via influence functions (RALIF) can consistently outperform the random selection baseline as well as other existing and state-of-the art active learning approaches.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics. 超越分布偏移：从训练动态的角度看虚假特征。

Transactions on machine learning research

Pub Date : 2023-10-01

Nihal Murali, Aahlad Puli, Ke Yu, Rajesh Ranganath, Kayhan Batmanghelich

Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process. We make the following observations: (1) While previous works highlight the harmful effects of spurious features on the generalization ability of DNNs, we emphasize that not all spurious features are harmful. Spurious features can be "benign" or "harmful" depending on whether they are "harder" or "easier" to learn than the core features for a given model. This definition is model and dataset dependent. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth (Baldock et al., 2021)) to quantify "easiness" for a given model and to identify this behavior during the training phase. (3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers. In other words, easy features learned by the initial layers of a DNN early during the training can (potentially) hurt model generalization. We verify our claims on medical and vision datasets, both simulated and real, and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like $𝒱$ -usable information (Ethayarajh et al., 2021). Lastly, our experiments show that monitoring only accuracy during training (as is common in machine learning pipelines) is insufficient to detect spurious features. We, therefore, highlight the need for monitoring early training dynamics using suitable instance difficulty metrics.

深度神经网络（DNN）在训练过程中容易学习到与标签相关但与学习问题无关的虚假特征。这会损害模型的泛化，在安全关键型应用中部署它们时会遇到问题。本文旨在通过内部神经元在训练过程中的学习动态，更好地理解虚假特征的影响。我们提出以下几点看法：（1）虽然之前的研究强调了虚假特征对 DNN 泛化能力的有害影响，但我们强调并非所有虚假特征都是有害的。虚假特征可以是 "良性 "的，也可以是 "有害 "的，这取决于它们比给定模型的核心特征 "更难 "学习还是 "更容易 "学习。这一定义取决于模型和数据集。(2) 在此基础上，我们使用实例难度方法（如 Prediction Depth，Baldock 等人，2021 年）来量化给定模型的 "易学性"，并在训练阶段识别这种行为。(3) 我们通过经验证明，通过观察 DNN 早期层的学习动态，可以检测出有害的虚假特征。换句话说，DNN 初始层在训练初期学习到的简单特征（可能）会损害模型的泛化。我们在医学和视觉数据集（包括模拟和真实数据集）上验证了我们的说法，并通过展示预测深度和信息论概念（如𝒱-usable information）之间的理论联系（Ethayarajh 等人，2021 年）来证明我们的假设在经验上是成功的。最后，我们的实验表明，在训练过程中只监控准确率（这在机器学习管道中很常见）不足以检测到虚假特征。因此，我们强调需要使用合适的实例难度指标来监控早期的训练动态。

{"title":"Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics.","authors":"Nihal Murali, Aahlad Puli, Ke Yu, Rajesh Ranganath, Kayhan Batmanghelich","doi":"","DOIUrl":"","url":null,"abstract":"Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process. We make the following observations: (1) While previous works highlight the harmful effects of spurious features on the generalization ability of DNNs, we emphasize that not all spurious features are harmful. Spurious features can be \"benign\" or \"harmful\" depending on whether they are \"harder\" or \"easier\" to learn than the core features for a given model. This definition is model and dataset dependent. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth (Baldock et al., 2021)) to quantify \"easiness\" for a given model and to identify this behavior during the training phase. (3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers. In other words, easy features learned by the initial layers of a DNN early during the training can (potentially) hurt model generalization. We verify our claims on medical and vision datasets, both simulated and real, and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like <math><mi>𝒱</mi></math>-usable information (Ethayarajh et al., 2021). Lastly, our experiments show that monitoring only accuracy during training (as is common in machine learning pipelines) is insufficient to detect spurious features. We, therefore, highlight the need for monitoring early training dynamics using suitable instance difficulty metrics.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11029547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140863872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RIFLE: Imputation and Robust Inference from Low Order Marginals. RIFLE：根据低阶边际值进行归因和稳健推断。

Transactions on machine learning research

Pub Date : 2023-09-01

Sina Baharlouei, Kelechi Ogudu, Sze-Chuan Suen, Meisam Razaviyayn

The ubiquity of missing values in real-world datasets poses a challenge for statistical inference and can prevent similar datasets from being analyzed in the same study, precluding many existing datasets from being used for new analyses. While an extensive collection of packages and algorithms have been developed for data imputation, the overwhelming majority perform poorly if there are many missing values and low sample sizes, which are unfortunately common characteristics in empirical data. Such low-accuracy estimations adversely affect the performance of downstream statistical models. We develop a statistical inference framework for regression and classification in the presence of missing data without imputation. Our framework, RIFLE (Robust InFerence via Low-order moment Estimations), estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model. We specialize our framework to linear regression and normal discriminant analysis, and we provide convergence and performance guarantees. This framework can also be adapted to impute missing data. In numerical experiments, we compare RIFLE to several state-of-the-art approaches (including MICE, Amelia, MissForest, KNN-imputer, MIDA, and Mean Imputer) for imputation and inference in the presence of missing values. Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small. RIFLE is publicly available at https://github.com/optimization-for-data-driven-science/RIFLE.

在现实世界的数据集中，缺失值无处不在，这给统计推断带来了挑战，并可能导致无法在同一研究中对类似数据集进行分析，从而使许多现有数据集无法用于新的分析。虽然已经开发了大量的数据估算软件包和算法，但绝大多数软件包和算法在缺失值多和样本量少的情况下表现不佳，而这正是经验数据的常见特征。这种低准确度的估计会对下游统计模型的性能产生不利影响。我们开发了一个统计推断框架，用于在存在缺失数据的情况下进行回归和分类，而无需估算。我们的框架 RIFLE（Robust InFerence via Low-order moment Estimations）通过相应的置信区间估计基础数据分布的低阶矩，从而学习分布上稳健的模型。我们将框架专门用于线性回归和正态判别分析，并提供收敛性和性能保证。这一框架还可用于缺失数据的补偿。在数值实验中，我们将 RIFLE 与几种最先进的方法（包括 MICE、Amelia、MissForest、KNN-imputer、MIDA 和 Mean Imputer）进行了比较，以便在存在缺失值的情况下进行归因和推断。我们的实验表明，当缺失值比例较高和/或数据点数量相对较少时，RIFLE 的表现优于其他基准算法。RIFLE 在 https://github.com/optimization-for-data-driven-science/RIFLE 上公开发布。

{"title":"RIFLE: Imputation and Robust Inference from Low Order Marginals.","authors":"Sina Baharlouei, Kelechi Ogudu, Sze-Chuan Suen, Meisam Razaviyayn","doi":"","DOIUrl":"","url":null,"abstract":"The ubiquity of missing values in real-world datasets poses a challenge for statistical inference and can prevent similar datasets from being analyzed in the same study, precluding many existing datasets from being used for new analyses. While an extensive collection of packages and algorithms have been developed for data imputation, the overwhelming majority perform poorly if there are many missing values and low sample sizes, which are unfortunately common characteristics in empirical data. Such low-accuracy estimations adversely affect the performance of downstream statistical models. We develop a statistical inference framework for regression and classification in the presence of missing data without imputation. Our framework, RIFLE (Robust InFerence via Low-order moment Estimations), estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model. We specialize our framework to linear regression and normal discriminant analysis, and we provide convergence and performance guarantees. This framework can also be adapted to impute missing data. In numerical experiments, we compare RIFLE to several state-of-the-art approaches (including MICE, Amelia, MissForest, KNN-imputer, MIDA, and Mean Imputer) for imputation and inference in the presence of missing values. Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small. RIFLE is publicly available at https://github.com/optimization-for-data-driven-science/RIFLE.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10977932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140320107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization. mL-BFGS：用于分布式大规模神经网络优化的基于动量的L-BFGS。

Transactions on machine learning research

Pub Date : 2023-08-01

Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS's potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.

由于Hessian相关计算的额外计算成本和随机训练中的不稳定性问题，拟牛顿方法在训练大规模神经网络方面仍然面临重大挑战。利用历史参数和梯度变化有效逼近Hessian的L-BFGS方法在随机训练中存在收敛不稳定性。到目前为止，使L-BFGS适应大规模随机训练的尝试会产生相当大的额外开销，这抵消了它在时钟时间内的收敛优势。本文提出了一种轻量级的基于动量的L-BFGS算法mL-BFGS，为大规模分布式深度神经网络（DNN）优化中的准牛顿（QN）方法铺平了道路。mL-BFGS在L-BFGS更新中引入了一种几乎无成本的动量格式，大大降低了Hessian中的随机噪声，从而稳定了随机优化过程中的收敛性。对于大规模的模型训练，mL-BFGS近似于逐块的Hessian，从而能够在所有计算节点上分配计算和内存成本。我们提供了一个支持mL-BFGS在随机设置下的收敛性分析。为了研究mL-BFGS在大规模深度神经网络训练中的潜力，我们使用mL-BFGS训练基准神经模型，并将性能与基线（SGD、Adam和其他准牛顿方法）进行比较。结果表明，mL-BFGS实现了明显的迭代加速和时钟加速。

{"title":"mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization.","authors":"Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr","doi":"","DOIUrl":"","url":null,"abstract":"Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS's potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Convergence and Calibration of Deep Learning with Differential Privacy. 论具有差异隐私的深度学习的收敛与校准

Transactions on machine learning research

Pub Date : 2023-06-01

Zhiqi Bu, Hua Wang, Zongyu Dai, Qi Long

Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more calibrated. Our code can be found at https://github.com/woodyx218/opacus_global_clipping.

差分私有（DP）训练通常以收敛速度较慢（因此准确度较低）以及比非私有训练更严重的误校准为代价，来保护数据隐私。为了分析 DP 训练的收敛性，我们从神经正切核（NTK）的角度进行了连续时间分析，它描述了 DP 训练中针对任意网络架构和损失函数的每样本梯度剪切和噪声添加。有趣的是，我们发现噪声添加只会影响隐私风险，而不会影响收敛性或校准，而每样本梯度剪切（在平面剪切和分层剪切方式下）只会影响收敛性和校准。此外，我们还观察到，虽然用小剪切规范训练的 DP 模型通常能达到最佳精度，但校准效果很差，因此并不可靠。与此形成鲜明对比的是，用大剪切规范训练的 DP 模型享有相同的隐私保证和相似的准确度，但校准效果明显更好。我们的代码见 https://github.com/woodyx218/opacus_global_clipping。

引用次数: 0

Traditional Machine Learning Models for Building Energy Performance Prediction: A Comparative Research 建筑节能性能预测的传统机器学习模型比较研究

Transactions on machine learning research

Pub Date : 2023-05-29 DOI: 10.11648/j.mlr.20230801.11

Zeyu Wu, Hongyang He

: A large proportion of total energy consumption is caused by buildings. Accurately predicting the heating and cooling demand of a building is crucial in the initial design phase in order to determine the most efficient solution from various designs. In this paper, in order to explore the effectiveness of basic machine learning algorithms to solve this problem, different machine learning models were used to estimate the heating and cooling loads of buildings, utilising data on the energy efficiency of buildings. Notably, this paper also discusses the performance of deep neural network prediction models and concludes that among traditional machine learning algorithms, GradientBoostingRegressor achieves better predictions, with Heating prediction reaching 0.998553 and Cooling prediction Compared with our machine learning algorithm HB-Regressor, the prediction accuracy of HB-Regressor is higher, reaching 0.998672 and 0.995153 respectively, but the fitting speed is not as fast as the GradientBoostingRegressor algorithm.

:建筑能耗占总能耗的很大一部分。为了从各种设计中确定最有效的解决方案，准确预测建筑物的供暖和制冷需求在初始设计阶段至关重要。在本文中，为了探索基本机器学习算法解决这一问题的有效性，利用建筑物能效数据，使用不同的机器学习模型来估计建筑物的供暖和制冷负荷。值得注意的是，本文还讨论了深度神经网络预测模型的性能，得出结论:在传统的机器学习算法中，GradientBoostingRegressor的预测效果更好，加热预测达到0.998553，冷却预测与我们的机器学习算法HB-Regressor相比，HB-Regressor的预测精度更高，分别达到0.998672和0.995153。但拟合速度不如GradientBoostingRegressor算法快。

引用次数: 0

Automatic Indexing of Digital Objects Through Learning from User Data 通过学习用户数据实现数字对象的自动索引

Transactions on machine learning research

Pub Date : 2023-01-31 DOI: 10.11648/j.mlr.20220702.12

C. Leung, Yuanxi Li

引用次数: 0

How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts. 你的公平性有多强?看不见的分配变化下的公平评估与维持。

Transactions on machine learning research

Pub Date : 2023-01-01

Haotao Wang, Junyuan Hong, Jiayu Zhou, Zhangyang Wang

Increasing concerns have been raised on deep learning fairness in recent years. Existing fairness-aware machine learning methods mainly focus on the fairness of in-distribution data. However, in real-world applications, it is common to have distribution shift between the training and test data. In this paper, we first show that the fairness achieved by existing methods can be easily broken by slight distribution shifts. To solve this problem, we propose a novel fairness learning method termed CUrvature MAtching (CUMA), which can achieve robust fairness generalizable to unseen domains with unknown distributional shifts. Specifically, CUMA enforces the model to have similar generalization ability on the majority and minority groups, by matching the loss curvature distributions of the two groups. We evaluate our method on three popular fairness datasets. Compared with existing methods, CUMA achieves superior fairness under unseen distribution shifts, without sacrificing either the overall accuracy or the in-distribution fairness.

近年来，深度学习公平性问题引起了越来越多的关注。现有的公平性感知机器学习方法主要关注分布内数据的公平性。然而，在真实的应用程序中，训练数据和测试数据之间的分布转移是很常见的。在本文中，我们首先证明了现有方法所达到的公平性很容易被轻微的分布变化所破坏。为了解决这一问题，我们提出了一种新的公平性学习方法曲率匹配(CUMA)，该方法可以实现可推广到未知分布变化的未知领域的鲁棒公平性。具体来说，CUMA通过匹配多数群体和少数群体的损失曲率分布，使模型具有相似的泛化能力。我们在三个流行的公平性数据集上评估了我们的方法。与现有方法相比，CUMA在不牺牲总体精度和分布内公平性的前提下，在不可见的分布偏移情况下实现了更好的公平性。

引用次数: 0

Estimating Potential Outcome Distributions with Collaborating Causal Networks. 利用协作因果网络估算潜在结果分布。

Transactions on machine learning research

Pub Date : 2022-09-01

Tianhui Zhou, William E Carson, David Carlson

Traditional causal inference approaches leverage observational study data to estimate the difference in observed (factual) and unobserved (counterfactual) outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, estimating the full potential outcome distributions could provide greater insights. However, existing methods for estimating treatment effect potential outcome distributions often impose restrictive or overly-simplistic assumptions about these distributions. Here, we propose Collaborating Causal Networks (CCN), a novel methodology which goes beyond the estimation of CATE alone by learning the full potential outcome distributions. Estimation of outcome distributions via the CCN framework does not require restrictive assumptions of the underlying data generating process (e.g. Gaussian errors). Additionally, our proposed method facilitates estimation of the utility of each possible treatment and permits individual-specific variation through utility functions (e.g. risk tolerance variability). CCN not only extends outcome estimation beyond traditional risk difference, but also enables a more comprehensive decision making process through definition of flexible comparisons. Under assumptions commonly made in the causal inference literature, we show that CCN learns distributions that asymptotically capture the correct potential outcome distributions. Furthermore, we propose an adjustment approach that is empirically effective in alleviating sample imbalance between treatment groups in observational studies. Finally, we evaluate the performance of CCN in multiple experiments on both synthetic and semi-synthetic data. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions.

传统的因果推断方法利用观察研究数据来估算潜在治疗的观察结果（事实）与非观察结果（反事实）之间的差异，即所谓的条件平均治疗效果（CATE）。然而，CATE 仅对应于第一时刻的比较，因此可能不足以反映治疗效果的全貌。作为一种替代方法，估算全部潜在结果分布可以提供更深入的见解。然而，现有的治疗效果潜在结果分布估计方法往往对这些分布施加了限制性或过于简单的假设。在此，我们提出了协作因果网络（CCN），这是一种新颖的方法，它通过学习完整的潜在结果分布，超越了单纯的 CATE 估算。通过 CCN 框架估计结果分布不需要对基础数据生成过程（如高斯误差）进行限制性假设。此外，我们提出的方法有助于估算每种可能治疗方法的效用，并通过效用函数（如风险承受能力变异）允许个体特定的变异。CCN 不仅将结果估算扩展到传统的风险差异之外，还通过定义灵活的比较方法实现了更全面的决策过程。在因果推理文献中常见的假设条件下，我们证明了 CCN 所学习的分布可以渐近地捕捉到正确的潜在结果分布。此外，我们还提出了一种调整方法，该方法可有效缓解观察研究中治疗组间的样本不平衡问题。最后，我们在合成数据和半合成数据的多个实验中评估了 CCN 的性能。我们证明，与现有的贝叶斯方法和深度生成方法相比，CCN 学习到的分布估计有所改进，在各种效用函数方面的决策也有所改进。

{"title":"Estimating Potential Outcome Distributions with Collaborating Causal Networks.","authors":"Tianhui Zhou, William E Carson, David Carlson","doi":"","DOIUrl":"","url":null,"abstract":"Traditional causal inference approaches leverage observational study data to estimate the difference in observed (factual) and unobserved (counterfactual) outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, estimating the full potential outcome distributions could provide greater insights. However, existing methods for estimating treatment effect potential outcome distributions often impose restrictive or overly-simplistic assumptions about these distributions. Here, we propose Collaborating Causal Networks (CCN), a novel methodology which goes beyond the estimation of CATE alone by learning the full potential outcome distributions. Estimation of outcome distributions via the CCN framework does not require restrictive assumptions of the underlying data generating process (e.g. Gaussian errors). Additionally, our proposed method facilitates estimation of the utility of each possible treatment and permits individual-specific variation through utility functions (e.g. risk tolerance variability). CCN not only extends outcome estimation beyond traditional risk difference, but also enables a more comprehensive decision making process through definition of flexible comparisons. Under assumptions commonly made in the causal inference literature, we show that CCN learns distributions that asymptotically capture the correct potential outcome distributions. Furthermore, we propose an adjustment approach that is empirically effective in alleviating sample imbalance between treatment groups in observational studies. Finally, we evaluate the performance of CCN in multiple experiments on both synthetic and semi-synthetic data. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2022 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10769464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139378979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Transactions on machine learning research

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀