首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks. QuIP#:用Hadamard非相干和Lattice码本实现更好的LLM量化。
Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes (≤ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric E 8 lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference. Our code can be found at https://github.com/Cornell-RelaxML/quip-sharp.

训练后量化(PTQ)通过将llm的权重量化到低精度,减少了llm的内存占用。在这项工作中,我们介绍了QuIP#,这是一种仅限权重的PTQ方法,使用三种新技术在极端压缩状态(每重量≤4比特)下实现了最先进的结果。首先,quip#通过使用随机Hadamard变换改进了QuIP (Chee et al., 2023)的非相干处理,该变换速度更快,具有更好的理论性质。其次,quip#使用矢量量化来利用非相干权值所具有的球状亚高斯分布:具体来说,我们引入了一组基于高度对称的e8晶格的硬件高效码本,实现了最优的8维单位球填充。第三,quip#使用微调来提高原始模型的保真度。我们的实验表明,quip#优于现有的PTQ方法,在PTQ扩展中实现了新的行为,并支持快速推理。我们的代码可以在https://github.com/Cornell-RelaxML/quip-sharp上找到。
{"title":"QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks.","authors":"Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes (≤ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric <math> <msub><mrow><mi>E</mi></mrow> <mrow><mn>8</mn></mrow> </msub> </math> lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference. Our code can be found at https://github.com/Cornell-RelaxML/quip-sharp.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"48630-48656"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Source Conformal Inference Under Distribution Shift. 分布偏移下的多源共形推理
Yi Liu, Alexander W Levis, Sharon-Lise Normand, Larry Han

Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.

近年来,人们越来越多地利用跨多个数据源的复杂机器学习模型来为更具通用性的决策提供信息。然而,数据源之间的分布变化和与共享个人层面数据相关的隐私问题,再加上机器学习预测缺乏不确定性量化,使得在多源环境中实现有效推断具有挑战性。在本文中,我们考虑的问题是如何利用多个可能存在偏差的数据源,获得目标人群的无分布预测区间。我们推导出了目标人群和源人群中未观测到结果的量值的有效影响函数,并证明了在估计骚扰函数时可以结合机器学习预测算法,同时仍能达到名义覆盖概率的参数收敛率。此外,当违反条件结果不变性时,我们提出了一种数据自适应策略,即提高信息数据源的权重以提高效率,降低非信息数据源的权重以减少偏差。我们通过大量的合成实验,强调了我们的建议对于各种保形得分和数据生成机制的稳健性和效率。2016-2022 年间美国接受高风险心脏外科手术的儿科患者的住院时间预测区间说明了我们的方法的实用性。
{"title":"Multi-Source Conformal Inference Under Distribution Shift.","authors":"Yi Liu, Alexander W Levis, Sharon-Lise Normand, Larry Han","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"31344-31382"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11345809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer. 时差非均质变压器纵向目标最小损耗估计。
Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan

We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the mean of counterfactual outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method's superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.

我们提出了深度纵向目标最小损失估计(Deep LTMLE),这是一种在纵向问题设置中估计动态处理策略下反事实结果均值的新方法。我们的方法利用了一个具有异构类型嵌入的转换器架构,并使用时间差学习进行训练。在使用变压器获得初始估计后,遵循目标最小损失似然估计(TMLE)框架,我们对通常与机器学习算法相关的偏差进行了统计校正。此外,我们的方法还通过提供基于渐近统计理论的95%置信区间来促进统计推断。仿真结果表明,我们的方法优于现有的方法,特别是在复杂的、长时间范围的场景中。它在小样本、短持续时间的情况下仍然有效,与渐近有效估计器的性能相匹配。为了在实践中证明我们的方法,我们在一项真实的心血管流行病学队列研究中应用我们的方法来估计标准与强化血压管理策略的反事实平均结果。
{"title":"Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer.","authors":"Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the mean of counterfactual outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method's superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"45097-45113"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145703172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning. 基于不确定性的非对称互惠学习桥接联邦学习模型异质性。
Jiaqi Wang, Chenxu Zhao, Lingjuan Lyu, Quanzeng You, Mengdi Huai, Fenglong Ma

This paper presents FedType, a simple yet pioneering framework designed to fill research gaps in heterogeneous model aggregation within federated learning (FL). FedType introduces small identical proxy models for clients, serving as agents for information exchange, ensuring model security, and achieving efficient communication simultaneously. To transfer knowledge between large private and small proxy models on clients, we propose a novel uncertainty-based asymmetrical reciprocity learning method, eliminating the need for any public data. Comprehensive experiments conducted on benchmark datasets demonstrate the efficacy and generalization ability of FedType across diverse settings. Our approach redefines federated learning paradigms by bridging model heterogeneity, eliminating reliance on public data, prioritizing client privacy, and reducing communication costs.

本文介绍了FedType,这是一个简单但具有开创性的框架,旨在填补联邦学习(FL)中异构模型聚合的研究空白。FedType为客户端引入了小型的相同代理模型,作为信息交换的代理,保证了模型的安全性,同时实现了高效的通信。为了在客户端大型私有和小型代理模型之间传递知识,我们提出了一种新的基于不确定性的非对称互惠学习方法,消除了对任何公共数据的需要。在基准数据集上进行的综合实验证明了FedType在不同设置下的有效性和泛化能力。我们的方法通过桥接模型异构性、消除对公共数据的依赖、优先考虑客户端隐私和降低通信成本来重新定义联邦学习范式。
{"title":"Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning.","authors":"Jiaqi Wang, Chenxu Zhao, Lingjuan Lyu, Quanzeng You, Mengdi Huai, Fenglong Ma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper presents FedType, a simple yet pioneering framework designed to fill research gaps in heterogeneous model aggregation within federated learning (FL). FedType introduces small identical proxy models for clients, serving as agents for information exchange, ensuring model security, and achieving efficient communication simultaneously. To transfer knowledge between large private and small proxy models on clients, we propose a novel uncertainty-based asymmetrical reciprocity learning method, eliminating the need for any public data. Comprehensive experiments conducted on benchmark datasets demonstrate the efficacy and generalization ability of FedType across diverse settings. Our approach redefines federated learning paradigms by bridging model heterogeneity, eliminating reliance on public data, prioritizing client privacy, and reducing communication costs.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"52290-52308"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12038961/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adapt and Diffuse: Sample-Adaptive Reconstruction Via Latent Diffusion Models. 适应与扩散:通过潜在扩散模型进行样本适应性重建。
Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the structure of the ground truth signal, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction task, which is often overlooked by contemporary techniques. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method that we call severity encoding, to estimate the degradation severity of noisy, degraded signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can give useful hints at the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform numerical experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to 10× acceleration in mean sampling speed.

在许多应用中都会出现逆问题,其目标是从嘈杂的、可能是(非)线性的观测数据中恢复干净的信号。重建问题的难度取决于多种因素,如地面实况信号的结构、退化的严重程度以及上述因素之间复杂的相互作用。这就导致了重建任务难度的自然逐样变化,而当代技术往往忽视了这一点。我们观察到的主要问题是,大多数现有的逆问题求解器缺乏根据重建任务的难度调整计算能力的能力,从而导致性能不佳和资源分配浪费。我们提出了一种称为 "严重度编码 "的新方法,用于在自动编码器的潜空间中估计噪声、降级信号的降级严重度。我们的研究表明,估计的严重程度与真实的劣化程度有很强的相关性,并能在逐个样本的基础上为重构问题的难度提供有用的提示。此外,我们还提出了一种基于潜在扩散模型的重建方法,该方法利用预测的损坏严重程度来微调反向扩散采样轨迹,从而实现样本自适应推理时间。我们的框架就像一个包装器,可以与任何基于潜扩散的基线求解器相结合,使其具有样本自适应性和加速度。我们对线性和非线性逆问题进行了数值实验,证明我们的技术大大提高了基线求解器的性能,平均采样速度提高了 10 倍。
{"title":"Adapt and Diffuse: Sample-Adaptive Reconstruction Via Latent Diffusion Models.","authors":"Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the structure of the ground truth signal, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction task, which is often overlooked by contemporary techniques. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method that we call severity encoding, to estimate the degradation severity of noisy, degraded signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can give useful hints at the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform numerical experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to 10× acceleration in mean sampling speed.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"12723-12753"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11421836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142334004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources. 利用部分数据源进行临床结果预测的对比学习
Meng Xia, Jonathan Wilson, Benjamin Goldstein, Ricardo Henao

The use of machine learning models to predict clinical outcomes from (longitudinal) electronic health record (EHR) data is becoming increasingly popular due to advances in deep architectures, representation learning, and the growing availability of large EHR datasets. Existing models generally assume access to the same data sources during both training and inference stages. However, this assumption is often challenged by the fact that real-world clinical datasets originate from various data sources (with distinct sets of covariates), which though can be available for training (in a research or retrospective setting), are more realistically only partially available (a subset of such sets) for inference when deployed. So motivated, we introduce Contrastive Learning for clinical Outcome Prediction with Partial data Sources (CLOPPS), that trains encoders to capture information across different data sources and then leverages them to build classifiers restricting access to a single data source. This approach can be used with existing cross-sectional or longitudinal outcome classification models. We present experiments on two real-world datasets demonstrating that CLOPPS consistently outperforms strong baselines in several practical scenarios.

由于深度架构、表征学习的进步以及大型电子病历数据集的日益普及,使用机器学习模型从(纵向)电子病历数据中预测临床结果正变得越来越流行。现有模型通常假设在训练和推理阶段都能访问相同的数据源。然而,现实世界中的临床数据集来自不同的数据源(具有不同的协变量集),虽然可以用于训练(在研究或回顾性设置中),但更现实的是,在部署时,只有部分数据(这些数据集的子集)可用于推理,因此这一假设常常受到挑战。受此启发,我们推出了利用部分数据源进行临床结果预测的对比学习(CLOPPS),该方法可训练编码器捕捉不同数据源的信息,然后利用编码器构建限制访问单一数据源的分类器。这种方法可用于现有的横截面或纵向结果分类模型。我们在两个真实世界数据集上进行了实验,证明 CLOPPS 在多个实际场景中的表现始终优于强大的基线。
{"title":"Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources.","authors":"Meng Xia, Jonathan Wilson, Benjamin Goldstein, Ricardo Henao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The use of machine learning models to predict clinical outcomes from (longitudinal) electronic health record (EHR) data is becoming increasingly popular due to advances in deep architectures, representation learning, and the growing availability of large EHR datasets. Existing models generally assume access to the same data sources during both training and inference stages. However, this assumption is often challenged by the fact that real-world clinical datasets originate from various data sources (with distinct sets of covariates), which though can be available for training (in a research or retrospective setting), are more realistically only partially available (a subset of such sets) for inference when deployed. So motivated, we introduce Contrastive Learning for clinical Outcome Prediction with Partial data Sources (CLOPPS), that trains encoders to capture information across different data sources and then leverages them to build classifiers restricting access to a single data source. This approach can be used with existing cross-sectional or longitudinal outcome classification models. We present experiments on two real-world datasets demonstrating that CLOPPS consistently outperforms strong baselines in several practical scenarios.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"54156-54177"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11326519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency. DiracDiffusion:去噪和增量重建,确保数据一致性
Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

Diffusion models have established new state of the art in a multitude of computer vision tasks, including image restoration. Diffusion-based inverse problem solvers generate reconstructions of exceptional visual quality from heavily corrupted measurements. However, in what is widely known as the perception-distortion trade-off, the price of perceptually appealing reconstructions is often paid in declined distortion metrics, such as PSNR. Distortion metrics measure faithfulness to the observation, a crucial requirement in inverse problems. In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image. We learn to reverse the degradation process in order to recover the clean image. Our technique maintains consistency with the original measurement throughout the reverse process, and allows for great flexibility in trading off perceptual quality for improved distortion metrics and sampling speedup via early-stopping. We demonstrate the efficiency of our method on different high-resolution datasets and inverse problems, achieving great improvements over other state-of-the-art diffusion-based methods with respect to both perceptual and distortion metrics.

在包括图像复原在内的众多计算机视觉任务中,扩散模型已确立了新的技术水平。基于扩散的逆问题求解器能从严重破坏的测量结果中生成视觉质量极高的重建图像。然而,在众所周知的 "感知-失真 "权衡中,具有感知吸引力的重构往往要以下降的失真指标(如 PSNR)为代价。失真度指标衡量的是对观察结果的忠实度,这是逆向问题的一个关键要求。在这项工作中,我们提出了一个新颖的逆问题求解框架,即我们假定观察结果来自一个随机退化过程,该过程会使原始清晰图像逐渐退化并产生噪声。我们要学会逆转退化过程,以恢复干净的图像。我们的技术能在整个逆向过程中保持与原始测量结果的一致性,并能通过早期停止,灵活地以感知质量换取改进的失真指标和采样速度。我们在不同的高分辨率数据集和逆向问题上展示了我们方法的效率,在感知和失真指标方面都比其他最先进的基于扩散的方法有了很大改进。
{"title":"DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency.","authors":"Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Diffusion models have established new state of the art in a multitude of computer vision tasks, including image restoration. Diffusion-based inverse problem solvers generate reconstructions of exceptional visual quality from heavily corrupted measurements. However, in what is widely known as the perception-distortion trade-off, the price of perceptually appealing reconstructions is often paid in declined distortion metrics, such as PSNR. Distortion metrics measure faithfulness to the observation, a crucial requirement in inverse problems. In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image. We learn to reverse the degradation process in order to recover the clean image. Our technique maintains consistency with the original measurement throughout the reverse process, and allows for great flexibility in trading off perceptual quality for improved distortion metrics and sampling speedup via early-stopping. We demonstrate the efficiency of our method on different high-resolution datasets and inverse problems, achieving great improvements over other state-of-the-art diffusion-based methods with respect to both perceptual and distortion metrics.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"12754-12783"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions. 从有偏见的选择标签到伪标签:从有偏见的决策中学习的期望最大化框架。
Trenton Chang, Jenna Wiens

Selective labels occur when label observations are subject to a decision-making process; e.g., diagnoses that depend on the administration of laboratory tests. We study a clinically-inspired selective label problem called disparate censorship, where labeling biases vary across subgroups and unlabeled individuals are imputed as "negative" (i.e., no diagnostic test = no illness). Machine learning models naïvely trained on such labels could amplify labeling bias. Inspired by causal models of selective labels, we propose Disparate Censorship Expectation-Maximization (DCEM), an algorithm for learning in the presence of disparate censorship. We theoretically analyze how DCEM mitigates the effects of disparate censorship on model performance. We validate DCEM on synthetic data, showing that it improves bias mitigation (area between ROC curves) without sacrificing discriminative performance (AUC) compared to baselines. We achieve similar results in a sepsis classification task using clinical data.

当标签观察受制于决策过程时,就会出现选择性标签;例如,依靠实验室检查的诊断。我们研究了一个临床启发的选择性标签问题,称为完全不同的审查,其中标签偏见在不同的亚组中有所不同,未标记的个体被归为“阴性”(即,没有诊断测试=没有疾病)。在这种标签上训练的机器学习模型naïvely可能会放大标签偏见。受选择性标签因果模型的启发,我们提出了一种用于在不同审查存在下学习的算法——不同审查期望最大化(DCEM)。我们从理论上分析了DCEM如何减轻不同审查对模型性能的影响。我们在合成数据上验证了DCEM,表明与基线相比,它在不牺牲判别性能(AUC)的情况下改善了偏差缓解(ROC曲线之间的面积)。我们在使用临床数据的脓毒症分类任务中取得了类似的结果。
{"title":"From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions.","authors":"Trenton Chang, Jenna Wiens","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Selective labels occur when label observations are subject to a decision-making process; <i>e.g</i>., diagnoses that depend on the administration of laboratory tests. We study a clinically-inspired selective label problem called disparate censorship, where labeling biases vary across subgroups and unlabeled individuals are imputed as \"negative\" (<i>i.e</i>., no diagnostic test = no illness). Machine learning models naïvely trained on such labels could amplify labeling bias. Inspired by causal models of selective labels, we propose Disparate Censorship Expectation-Maximization (DCEM), an algorithm for learning in the presence of disparate censorship. We theoretically analyze how DCEM mitigates the effects of disparate censorship on model performance. We validate DCEM on synthetic data, showing that it improves bias mitigation (area between ROC curves) without sacrificing discriminative performance (AUC) compared to baselines. We achieve similar results in a sepsis classification task using clinical data.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"6286-6324"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199211/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network conditioning for synergistic learning on partial annotations. 部分注释协同学习的网络调节。
Benjamin Billot, Neel Dey, Esra Abaci Turk, P Ellen Grant, Polina Golland

The robustness and accuracy of multi-organ segmentation networks is limited by the scarcity of labels. A common strategy to alleviate the annotation burden is to use partially labelled datasets, where each image can be annotated for a subset of all organs of interest. Unfortunately, this approach causes inconsistencies in the background class since it can now include target organs. Moreover, we consider the even more relaxed setting of region-based segmentation, where voxels can be labelled for super-regions, thus causing further inconsistencies across annotations. Here we propose CoNeMOS (Conditional Network for Multi-Organ Segmentation), a framework that leverages a label-conditioned network for synergistic learning on partially labelled region-based segmentations. Conditioning is achieved by combining convolutions with expressive Feature-wise Linear Modulation (FiLM) layers, whose parameters are controlled by an auxiliary network. In contrast to other conditioning methods, FiLM layers are stable to train and add negligible computation overhead, which enables us to condition the entire network. As a result, the network can learn where it needs to extract shared or label-specific features, instead of imposing it with the architecture (e.g., with different segmentation heads). By encouraging flexible synergies across labels, our method obtains state-of-the-art results for the segmentation of challenging low-resolution fetal MRI data. Our code is available at https://github.com/BBillot/CoNeMOS.

多器官分割网络的鲁棒性和准确性受到标签稀缺性的限制。减轻注释负担的一种常用策略是使用部分标记的数据集,其中每个图像可以为所有感兴趣的器官的子集进行注释。不幸的是,这种方法导致背景类不一致,因为它现在可以包括目标器官。此外,我们考虑了更宽松的基于区域的分割设置,其中体素可以标记为超区域,从而导致注释之间的进一步不一致。在这里,我们提出了CoNeMOS(条件网络多器官分割),这是一个利用标签条件网络在部分标记的基于区域的分割上进行协同学习的框架。通过将卷积与表达特征的线性调制(FiLM)层相结合来实现条件调节,其参数由辅助网络控制。与其他调节方法相比,FiLM层训练稳定,并且增加的计算开销可以忽略不计,这使我们能够调节整个网络。因此,网络可以学习在哪里需要提取共享或标签特定的特征,而不是用架构强加给它(例如,使用不同的分段头)。通过鼓励跨标签灵活的协同作用,我们的方法获得了具有挑战性的低分辨率胎儿MRI数据分割的最先进的结果。我们的代码可在https://github.com/BBillot/CoNeMOS上获得。
{"title":"Network conditioning for synergistic learning on partial annotations.","authors":"Benjamin Billot, Neel Dey, Esra Abaci Turk, P Ellen Grant, Polina Golland","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The robustness and accuracy of multi-organ segmentation networks is limited by the scarcity of labels. A common strategy to alleviate the annotation burden is to use partially labelled datasets, where each image can be annotated for a subset of all organs of interest. Unfortunately, this approach causes inconsistencies in the background class since it can now include target organs. Moreover, we consider the even more relaxed setting of region-based segmentation, where voxels can be labelled for super-regions, thus causing further inconsistencies across annotations. Here we propose CoNeMOS (Conditional Network for Multi-Organ Segmentation), a framework that leverages a label-conditioned network for synergistic learning on partially labelled region-based segmentations. Conditioning is achieved by combining convolutions with expressive Feature-wise Linear Modulation (FiLM) layers, whose parameters are controlled by an auxiliary network. In contrast to other conditioning methods, FiLM layers are stable to train and add negligible computation overhead, which enables us to condition the entire network. As a result, the network can <i>learn</i> where it needs to extract shared or label-specific features, instead of imposing it with the architecture (e.g., with different segmentation heads). By encouraging flexible synergies across labels, our method obtains state-of-the-art results for the segmentation of challenging low-resolution fetal MRI data. Our code is available at https://github.com/BBillot/CoNeMOS.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"250 ","pages":"119-130"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393823/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmentation. 野外自监督预训练赋予医学图像转换器图像获取鲁棒性:肺癌分割的应用。
Jue Jiang, Harini Veeraraghavan

Self-supervised learning (SSL) is an approach to pretrain models with unlabeled datasets and extract useful feature representations such that these models can be easily fine-tuned for various downstream tasks. Self-pretraining applies SSL on curated task-specific datasets without using task-specific labels. Increasing availability of public data repositories has now made it possible to utilize diverse and large, task unrelated datasets to pretrain models in the "wild" using SSL. However, the benefit of such wild-pretraining over self-pretraining has not been studied in the context of medical image analysis. Hence, we analyzed transformers (Swin and ViT) and a convolutional neural network created using wild- and self-pretraining trained to segment lung tumors from 3D-computed tomography (CT) scans in terms of: (a) accuracy, (b) fine-tuning epoch efficiency, and (c) robustness to image acquisition differences (contrast versus non-contrast, slice thickness, and image reconstruction kernels). We also studied feature reuse using centered kernel alignment (CKA) with the Swin networks. Our analysis with two independent testing (public N = 139; internal N = 196) datasets showed that wild-pretrained Swin models significantly outperformed self-pretrained Swin for the various imaging acquisitions. Fine-tuning epoch efficiency was higher for both wild-pretrained Swin and ViT models compared to their self-pretrained counterparts. Feature reuse close to the final encoder layers was lower than in the early layers for wild-pretrained models irrespective of the pretext tasks used in SSL. Models and code will be made available through GitHub upon manuscript acceptance.

自监督学习(Self-supervised learning, SSL)是一种使用未标记数据集预训练模型并提取有用特征表示的方法,这样这些模型就可以很容易地针对各种下游任务进行微调。自我预训练在特定任务的数据集上应用SSL,而不使用特定任务的标签。公共数据存储库的可用性越来越高,现在可以利用各种各样的、大型的、任务无关的数据集来使用SSL在“野外”预训练模型。然而,在医学图像分析的背景下,这种野生预训练相对于自我预训练的好处尚未得到研究。因此,我们分析了变压器(Swin和ViT)和卷积神经网络,该网络使用野生和自我预训练创建,用于从3d计算机断层扫描(CT)扫描中分割肺肿瘤,并从以下方面进行了分析:(a)准确性,(b)微调epoch效率,以及(c)对图像采集差异(对比度与非对比度,切片厚度和图像重建核)的鲁棒性。我们还研究了在Swin网络中使用中心核对齐(CKA)的特征重用。我们的分析采用两个独立检验(公共N = 139;内部N = 196)数据集表明,野生预训练的Swin模型在各种成像获取方面明显优于自预训练的Swin。与自我预训练的模型相比,野生预训练的Swin和ViT模型的微调历元效率更高。无论SSL中使用的借口任务如何,接近最终编码器层的特征重用低于原始预训练模型的早期层。稿件接受后,模型和代码将通过GitHub提供。
{"title":"Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmentation.","authors":"Jue Jiang, Harini Veeraraghavan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Self-supervised learning (SSL) is an approach to pretrain models with unlabeled datasets and extract useful feature representations such that these models can be easily fine-tuned for various downstream tasks. Self-pretraining applies SSL on curated task-specific datasets without using task-specific labels. Increasing availability of public data repositories has now made it possible to utilize diverse and large, task unrelated datasets to pretrain models in the \"wild\" using SSL. However, the benefit of such wild-pretraining over self-pretraining has not been studied in the context of medical image analysis. Hence, we analyzed transformers (Swin and ViT) and a convolutional neural network created using wild- and self-pretraining trained to segment lung tumors from 3D-computed tomography (CT) scans in terms of: (a) accuracy, (b) fine-tuning epoch efficiency, and (c) robustness to image acquisition differences (contrast versus non-contrast, slice thickness, and image reconstruction kernels). We also studied feature reuse using centered kernel alignment (CKA) with the Swin networks. Our analysis with two independent testing (public N = 139; internal N = 196) datasets showed that wild-pretrained Swin models significantly outperformed self-pretrained Swin for the various imaging acquisitions. Fine-tuning epoch efficiency was higher for both wild-pretrained Swin and ViT models compared to their self-pretrained counterparts. Feature reuse close to the final encoder layers was lower than in the early layers for wild-pretrained models irrespective of the pretext tasks used in SSL. Models and code will be made available through GitHub upon manuscript acceptance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"250 ","pages":"708-721"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11741178/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1