首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks. 用双层 ReLU 神经网络进行可证明的多任务表征学习
Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a single task or (ii) they are linear, very little is known about the closer-to-practice case of nonlinear NNs trained on multiple tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an r -dimensional subspace within the d r -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of d . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all r ground-truth features.

一种日益流行的机器学习范式是在许多任务上离线预训练神经网络(NN),然后使其适应下游任务,通常只重新训练网络的最后一层线性层。这种方法在各种情况下都能产生强大的下游性能,证明多任务预训练能带来有效的特征学习。尽管最近的一些理论研究表明,浅层网络在以下两种情况下都能学习到有意义的特征:(i) 在单一任务中训练;(ii) 是线性的,但对于在多个任务中训练的非线性网络这种更贴近实践的情况却知之甚少。在这项研究中,我们首次证明了在多个任务中使用非线性模型进行训练时会出现特征学习。我们的主要见解是,多任务预训练会产生一种伪对比损失,这种损失有利于将通常在不同任务中具有相同标签的点对齐的表征。利用这一观察结果,我们证明,当任务是二元分类任务时,标签取决于数据在 d ≫ r -dimensional 输入空间内的 r -dimensional 子空间上的投影,在双层 ReLU NN 上的基于梯度的简单多任务学习算法可以恢复这一投影,从而在样本和神经元复杂度与 d 无关的情况下泛化到下游任务。与此相反,我们的研究表明,在单个任务的高概率抽取中,对该单个任务的训练无法保证学习到所有 r 个地面真实特征。
{"title":"Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks.","authors":"Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a <i>single</i> task or (ii) they are <i>linear</i>, very little is known about the closer-to-practice case of <i>nonlinear</i> NNs trained on <i>multiple</i> tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an <math><mi>r</mi></math> -dimensional subspace within the <math><mi>d</mi> <mo>≫</mo> <mi>r</mi></math> -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of <math><mi>d</mi></math> . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all <math><mi>r</mi></math> ground-truth features.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"9292-9345"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions. 多区域马尔可夫高斯过程:发现跨多个脑区定向通信的高效方法
Weihan Li, Chengrui Li, Yule Wang, Anqi Wu

Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.

研究不同脑区之间复杂的相互作用对神经科学至关重要。各种统计方法探索了多个脑区之间的潜在交流。其中两大类是高斯过程(GP)和线性动力系统(LDS),它们各有千秋。基于 GP 的方法能有效发现具有频带和通信方向的潜变量。相反,基于 LDS 的方法计算效率高,但在潜在表示方面缺乏强大的表现力。在本研究中,我们将这两种方法融合在一起,创建了一个反映多输出 GP 的 LDS,称为多区域马尔可夫高斯过程(MRM-GP)。我们的研究在 LDS 和多输出 GP 之间建立了联系,明确地模拟了神经记录潜空间内的频率和相位延迟。因此,该模型在时间点上实现了线性推理成本,并提供了可解释的低维表示,揭示了跨脑区的通信方向,并将振荡通信分离为不同的频段。
{"title":"Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions.","authors":"Weihan Li, Chengrui Li, Yule Wang, Anqi Wu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"28112-28131"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Source Conformal Inference Under Distribution Shift. 分布偏移下的多源共形推理
Yi Liu, Alexander W Levis, Sharon-Lise Normand, Larry Han

Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.

近年来,人们越来越多地利用跨多个数据源的复杂机器学习模型来为更具通用性的决策提供信息。然而,数据源之间的分布变化和与共享个人层面数据相关的隐私问题,再加上机器学习预测缺乏不确定性量化,使得在多源环境中实现有效推断具有挑战性。在本文中,我们考虑的问题是如何利用多个可能存在偏差的数据源,获得目标人群的无分布预测区间。我们推导出了目标人群和源人群中未观测到结果的量值的有效影响函数,并证明了在估计骚扰函数时可以结合机器学习预测算法,同时仍能达到名义覆盖概率的参数收敛率。此外,当违反条件结果不变性时,我们提出了一种数据自适应策略,即提高信息数据源的权重以提高效率,降低非信息数据源的权重以减少偏差。我们通过大量的合成实验,强调了我们的建议对于各种保形得分和数据生成机制的稳健性和效率。2016-2022 年间美国接受高风险心脏外科手术的儿科患者的住院时间预测区间说明了我们的方法的实用性。
{"title":"Multi-Source Conformal Inference Under Distribution Shift.","authors":"Yi Liu, Alexander W Levis, Sharon-Lise Normand, Larry Han","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"31344-31382"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11345809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adapt and Diffuse: Sample-Adaptive Reconstruction Via Latent Diffusion Models. 适应与扩散:通过潜在扩散模型进行样本适应性重建。
Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the structure of the ground truth signal, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction task, which is often overlooked by contemporary techniques. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method that we call severity encoding, to estimate the degradation severity of noisy, degraded signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can give useful hints at the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform numerical experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to 10× acceleration in mean sampling speed.

在许多应用中都会出现逆问题,其目标是从嘈杂的、可能是(非)线性的观测数据中恢复干净的信号。重建问题的难度取决于多种因素,如地面实况信号的结构、退化的严重程度以及上述因素之间复杂的相互作用。这就导致了重建任务难度的自然逐样变化,而当代技术往往忽视了这一点。我们观察到的主要问题是,大多数现有的逆问题求解器缺乏根据重建任务的难度调整计算能力的能力,从而导致性能不佳和资源分配浪费。我们提出了一种称为 "严重度编码 "的新方法,用于在自动编码器的潜空间中估计噪声、降级信号的降级严重度。我们的研究表明,估计的严重程度与真实的劣化程度有很强的相关性,并能在逐个样本的基础上为重构问题的难度提供有用的提示。此外,我们还提出了一种基于潜在扩散模型的重建方法,该方法利用预测的损坏严重程度来微调反向扩散采样轨迹,从而实现样本自适应推理时间。我们的框架就像一个包装器,可以与任何基于潜扩散的基线求解器相结合,使其具有样本自适应性和加速度。我们对线性和非线性逆问题进行了数值实验,证明我们的技术大大提高了基线求解器的性能,平均采样速度提高了 10 倍。
{"title":"Adapt and Diffuse: Sample-Adaptive Reconstruction Via Latent Diffusion Models.","authors":"Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the structure of the ground truth signal, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction task, which is often overlooked by contemporary techniques. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method that we call severity encoding, to estimate the degradation severity of noisy, degraded signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can give useful hints at the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform numerical experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to 10× acceleration in mean sampling speed.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"12723-12753"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11421836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142334004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources. 利用部分数据源进行临床结果预测的对比学习
Meng Xia, Jonathan Wilson, Benjamin Goldstein, Ricardo Henao

The use of machine learning models to predict clinical outcomes from (longitudinal) electronic health record (EHR) data is becoming increasingly popular due to advances in deep architectures, representation learning, and the growing availability of large EHR datasets. Existing models generally assume access to the same data sources during both training and inference stages. However, this assumption is often challenged by the fact that real-world clinical datasets originate from various data sources (with distinct sets of covariates), which though can be available for training (in a research or retrospective setting), are more realistically only partially available (a subset of such sets) for inference when deployed. So motivated, we introduce Contrastive Learning for clinical Outcome Prediction with Partial data Sources (CLOPPS), that trains encoders to capture information across different data sources and then leverages them to build classifiers restricting access to a single data source. This approach can be used with existing cross-sectional or longitudinal outcome classification models. We present experiments on two real-world datasets demonstrating that CLOPPS consistently outperforms strong baselines in several practical scenarios.

由于深度架构、表征学习的进步以及大型电子病历数据集的日益普及,使用机器学习模型从(纵向)电子病历数据中预测临床结果正变得越来越流行。现有模型通常假设在训练和推理阶段都能访问相同的数据源。然而,现实世界中的临床数据集来自不同的数据源(具有不同的协变量集),虽然可以用于训练(在研究或回顾性设置中),但更现实的是,在部署时,只有部分数据(这些数据集的子集)可用于推理,因此这一假设常常受到挑战。受此启发,我们推出了利用部分数据源进行临床结果预测的对比学习(CLOPPS),该方法可训练编码器捕捉不同数据源的信息,然后利用编码器构建限制访问单一数据源的分类器。这种方法可用于现有的横截面或纵向结果分类模型。我们在两个真实世界数据集上进行了实验,证明 CLOPPS 在多个实际场景中的表现始终优于强大的基线。
{"title":"Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources.","authors":"Meng Xia, Jonathan Wilson, Benjamin Goldstein, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The use of machine learning models to predict clinical outcomes from (longitudinal) electronic health record (EHR) data is becoming increasingly popular due to advances in deep architectures, representation learning, and the growing availability of large EHR datasets. Existing models generally assume access to the same data sources during both training and inference stages. However, this assumption is often challenged by the fact that real-world clinical datasets originate from various data sources (with distinct sets of covariates), which though can be available for training (in a research or retrospective setting), are more realistically only partially available (a subset of such sets) for inference when deployed. So motivated, we introduce Contrastive Learning for clinical Outcome Prediction with Partial data Sources (CLOPPS), that trains encoders to capture information across different data sources and then leverages them to build classifiers restricting access to a single data source. This approach can be used with existing cross-sectional or longitudinal outcome classification models. We present experiments on two real-world datasets demonstrating that CLOPPS consistently outperforms strong baselines in several practical scenarios.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"54156-54177"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11326519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency. DiracDiffusion:去噪和增量重建,确保数据一致性
Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

Diffusion models have established new state of the art in a multitude of computer vision tasks, including image restoration. Diffusion-based inverse problem solvers generate reconstructions of exceptional visual quality from heavily corrupted measurements. However, in what is widely known as the perception-distortion trade-off, the price of perceptually appealing reconstructions is often paid in declined distortion metrics, such as PSNR. Distortion metrics measure faithfulness to the observation, a crucial requirement in inverse problems. In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image. We learn to reverse the degradation process in order to recover the clean image. Our technique maintains consistency with the original measurement throughout the reverse process, and allows for great flexibility in trading off perceptual quality for improved distortion metrics and sampling speedup via early-stopping. We demonstrate the efficiency of our method on different high-resolution datasets and inverse problems, achieving great improvements over other state-of-the-art diffusion-based methods with respect to both perceptual and distortion metrics.

在包括图像复原在内的众多计算机视觉任务中,扩散模型已确立了新的技术水平。基于扩散的逆问题求解器能从严重破坏的测量结果中生成视觉质量极高的重建图像。然而,在众所周知的 "感知-失真 "权衡中,具有感知吸引力的重构往往要以下降的失真指标(如 PSNR)为代价。失真度指标衡量的是对观察结果的忠实度,这是逆向问题的一个关键要求。在这项工作中,我们提出了一个新颖的逆问题求解框架,即我们假定观察结果来自一个随机退化过程,该过程会使原始清晰图像逐渐退化并产生噪声。我们要学会逆转退化过程,以恢复干净的图像。我们的技术能在整个逆向过程中保持与原始测量结果的一致性,并能通过早期停止,灵活地以感知质量换取改进的失真指标和采样速度。我们在不同的高分辨率数据集和逆向问题上展示了我们方法的效率,在感知和失真指标方面都比其他最先进的基于扩散的方法有了很大改进。
{"title":"DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency.","authors":"Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Diffusion models have established new state of the art in a multitude of computer vision tasks, including image restoration. Diffusion-based inverse problem solvers generate reconstructions of exceptional visual quality from heavily corrupted measurements. However, in what is widely known as the perception-distortion trade-off, the price of perceptually appealing reconstructions is often paid in declined distortion metrics, such as PSNR. Distortion metrics measure faithfulness to the observation, a crucial requirement in inverse problems. In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image. We learn to reverse the degradation process in order to recover the clean image. Our technique maintains consistency with the original measurement throughout the reverse process, and allows for great flexibility in trading off perceptual quality for improved distortion metrics and sampling speedup via early-stopping. We demonstrate the efficiency of our method on different high-resolution datasets and inverse problems, achieving great improvements over other state-of-the-art diffusion-based methods with respect to both perceptual and distortion metrics.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"12754-12783"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability. 通过分布可学习性对分布转移下的可学习数据库操作进行理论分析》(Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability)。
Sepanta Zeighami, Cyrus Shahabi

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.

使用机器学习执行数据库操作(如索引、卡片性估计和排序)可带来巨大的性能优势。然而,当数据集发生变化和数据分布发生变化时,经验结果也显示学习模型的性能下降,可能比非学习模型更差。由于无法保证模型部署后的性能,再加上缺乏对学习方法的理论理解,这就削弱了其实际应用性。在本文中,我们针对上述操作,首次从理论上描述了动态数据集中学习模型的性能。我们的结果表明了学习模型可实现的新理论特性,并提供了模型性能的界限,这些界限描述了模型相对于非学习方法的优势,说明了为什么以及什么时候学习模型可以优于其他方法。我们的分析建立了分布式可学习性框架和新颖的理论工具,为今后分析学习型数据库操作奠定了基础。
{"title":"Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.","authors":"Sepanta Zeighami, Cyrus Shahabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the <i>distribution learnability</i> framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"58283-58305"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation. DISCRET:综合治疗效果估算的忠实解释。
Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong

Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.

设计忠实而准确的人工智能模型极具挑战性,尤其是在个体治疗效果估算(ITE)领域。部署在医疗保健等关键环境中的 ITE 预测模型在理想情况下应:(i) 准确;(ii) 提供忠实的解释。然而,目前的解决方案并不充分:最先进的黑箱模型无法提供解释,黑箱模型的事后解释器缺乏忠实性保证,而可自我解释的模型则大大降低了准确性。为了解决这些问题,我们提出了 DISCRET,这是一种可自我解释的 ITE 框架,它能为每个样本合成基于规则的忠实解释。DISCRET 背后的一个关键见解是,解释可以作为数据库查询来识别相似的样本子群。我们提供了一种新颖的 RL 算法,可以从大型搜索空间中高效地合成这些解释。我们在涉及表格、图像和文本数据的各种任务中对 DISCRET 进行了评估。DISCRET 的表现优于最好的自解释模型,其准确性可与最好的黑盒模型相媲美,同时还能提供忠实的解释。DISCRET可在https://github.com/wuyinjun-1993/DISCRET-ICML2024。
{"title":"DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation.","authors":"Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as <i>database queries</i> to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"53597-53618"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11350397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters. 核去偏插件估算:针对众多目标参数的无影响函数同步自动去差分。
Brian Cho, Yaroslav Mukhin, Kyra Gan, Ivana Malenica

When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce "plug-in bias." Traditional methods addressing this suboptimal bias-variance trade-off rely on the influence function (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the nuisance parameter multiple times using the corresponding IFs, which poses analytical and computational challenges. In this work, we leverage the targeted maximum likelihood estimation (TMLE) framework to propose a novel method named kernel debiased plug-in estimation (KDPE). KDPE refines an initial estimate through regularized likelihood maximization steps, employing a nonparametric model based on reproducing kernel Hilbert spaces. We show that KDPE: (i) simultaneously debiases all pathwise differentiable target parameters that satisfy our regularity conditions, (ii) does not require the IF for implementation, and (iii) remains computationally tractable. We numerically illustrate the use of KDPE and validate our theoretical results.

在带有干扰参数的非参数模型中估计目标参数时,用非参数估计器替代未知干扰参数可能会引入 "插入偏差"。解决这种偏差-方差权衡次优问题的传统方法依赖于目标参数的影响函数(IF)。在估计多个目标参数时,这些方法需要使用相应的影响函数多次去扰动参数,这给分析和计算带来了挑战。在这项工作中,我们利用目标最大似然估计(TMLE)框架,提出了一种名为核去势插件估计(KDPE)的新方法。KDPE 采用基于再现核希尔伯特空间的非参数模型,通过正则化似然最大化步骤完善初始估计值。我们证明了 KDPE:(i) 能同时去除满足正则条件的所有路径可变目标参数,(ii) 不需要 IF 来实现,(iii) 计算上仍然可行。我们用数字说明了 KDPE 的使用,并验证了我们的理论结果。
{"title":"Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters.","authors":"Brian Cho, Yaroslav Mukhin, Kyra Gan, Ivana Malenica","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce \"plug-in bias.\" Traditional methods addressing this suboptimal bias-variance trade-off rely on the <i>influence function</i> (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the nuisance parameter multiple times using the corresponding IFs, which poses analytical and computational challenges. In this work, we leverage the <i>targeted maximum likelihood estimation</i> (TMLE) framework to propose a novel method named <i>kernel debiased plug-in estimation</i> (KDPE). KDPE refines an initial estimate through regularized likelihood maximization steps, employing a nonparametric model based on <i>reproducing kernel Hilbert spaces</i>. We show that KDPE: (i) simultaneously debiases <i>all</i> pathwise differentiable target parameters that satisfy our regularity conditions, (ii) does not require the IF for implementation, and (iii) remains computationally tractable. We numerically illustrate the use of KDPE and validate our theoretical results.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"8534-8555"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11359899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges. 利用 LLM 从电子病历中检索证据:可能性与挑战。
Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C Wallace

Unstructured data in Electronic Health Records (EHRs) often contains critical information-complementary to imaging-that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs as a mechanism to efficiently retrieve and summarize unstructured evidence in patient EHR relevant to a given query. Our method entails tasking an LLM to infer whether a patient has, or is at risk of, a particular condition on the basis of associated notes; if so, we ask the model to summarize the supporting evidence. Under expert evaluation, we find that this LLM-based approach provides outputs consistently preferred to a pre-LLM information retrieval baseline. Manual evaluation is expensive, so we also propose and validate a method using an LLM to evaluate (other) LLM outputs for this task, allowing us to scale up evaluation. Our findings indicate the promise of LLMs as interfaces to EHR, but also highlight the outstanding challenge posed by "hallucinations". In this setting, however, we show that model confidence in outputs strongly correlates with faithful summaries, offering a practical means to limit confabulations.

电子健康记录(EHR)中的非结构化数据通常包含重要的信息--与影像资料互为补充--可为放射科医生的诊断提供依据。但是,由于患者的笔记数量庞大,再加上时间限制,人工识别相关证据实际上是不可行的。在这项工作中,我们提出并评估了一种零点策略,利用 LLM 作为一种机制,有效检索和总结病人电子病历中与给定查询相关的非结构化证据。我们的方法要求 LLM 根据相关笔记推断病人是否患有某种疾病或是否有患病风险;如果是,我们要求模型总结支持性证据。通过专家评估,我们发现这种基于 LLM 的方法所提供的输出结果始终优于 LLM 前的信息检索基线。人工评估的成本很高,因此我们还提出并验证了一种使用 LLM 评估(其他)LLM 输出的方法,使我们能够扩大评估范围。我们的研究结果表明了 LLM 作为电子病历接口的前景,但也强调了 "幻觉 "带来的巨大挑战。不过,在这种情况下,我们发现模型对输出结果的信心与忠实摘要密切相关,这为限制幻觉提供了一种切实可行的方法。
{"title":"Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges.","authors":"Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C Wallace","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Unstructured data in Electronic Health Records (EHRs) often contains critical information-complementary to imaging-that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs as a mechanism to efficiently retrieve and summarize unstructured evidence in patient EHR relevant to a given query. Our method entails tasking an LLM to infer whether a patient has, or is at risk of, a particular condition on the basis of associated notes; if so, we ask the model to summarize the supporting evidence. Under expert evaluation, we find that this LLM-based approach provides outputs consistently preferred to a pre-LLM information retrieval baseline. Manual evaluation is expensive, so we also propose and validate a method using an LLM to evaluate (other) LLM outputs for this task, allowing us to scale up evaluation. Our findings indicate the promise of LLMs as interfaces to EHR, but also highlight the outstanding challenge posed by \"hallucinations\". In this setting, however, we show that model confidence in outputs strongly correlates with faithful summaries, offering a practical means to limit confabulations.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"248 ","pages":"489-505"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368037/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1