首页 > 最新文献

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning最新文献

英文 中文
DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models 基于gan的真实世界超分辨率模型的伪影检测和删除
Liangbin Xie, Xintao Wang, Xiangyu Chen, Gengyan Li, Ying Shan, Jiantao Zhou, Chao Dong
Image super-resolution (SR) with generative adversarial networks (GAN) has achieved great success in restoring realistic details. However, it is notorious that GAN-based SR models will inevitably produce unpleasant and undesirable artifacts, especially in practical scenarios. Previous works typically suppress artifacts with an extra loss penalty in the training phase. They only work for in-distribution artifact types generated during training. When applied in real-world scenarios, we observe that those improved methods still generate obviously annoying artifacts during inference. In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. We then develop a novel method, namely, DeSRA, to Detect and then Delete those SR Artifacts in practice. Specifically, we propose to measure a relative local variance distance from MSE-SR results and GAN-SR results, and locate the problematic areas based on the above distance and semantic-aware thresholds. After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data. Equipped with our DeSRA, we can successfully eliminate artifacts from inference and improve the ability of SR models to be applied in real-world scenarios. The code will be available at https://github.com/TencentARC/DeSRA.
基于生成对抗网络(GAN)的图像超分辨率(SR)在还原真实细节方面取得了巨大成功。然而,臭名昭著的是,基于gan的SR模型将不可避免地产生不愉快和不受欢迎的工件,特别是在实际场景中。以前的作品通常在训练阶段用额外的损失惩罚来抑制伪影。它们只适用于训练期间生成的分布内工件类型。当应用于现实场景时,我们观察到这些改进的方法在推理过程中仍然会产生明显令人讨厌的伪影。在本文中,我们分析了未见过的测试数据中产生的GAN伪影的原因和特征。然后,我们开发了一种新的方法,即DeSRA,在实践中检测并删除这些SR工件。具体而言,我们建议测量MSE-SR结果和GAN-SR结果之间的相对局部方差距离,并基于上述距离和语义感知阈值定位问题区域。在检测到伪影区域后,我们开发了一个微调程序来改进基于gan的SR模型,使其能够处理更多未见过的真实数据中的相似类型的伪影。配备我们的DeSRA,我们可以成功地从推理中消除工件,并提高SR模型在现实场景中的应用能力。代码可在https://github.com/TencentARC/DeSRA上获得。
{"title":"DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models","authors":"Liangbin Xie, Xintao Wang, Xiangyu Chen, Gengyan Li, Ying Shan, Jiantao Zhou, Chao Dong","doi":"10.48550/arXiv.2307.02457","DOIUrl":"https://doi.org/10.48550/arXiv.2307.02457","url":null,"abstract":"Image super-resolution (SR) with generative adversarial networks (GAN) has achieved great success in restoring realistic details. However, it is notorious that GAN-based SR models will inevitably produce unpleasant and undesirable artifacts, especially in practical scenarios. Previous works typically suppress artifacts with an extra loss penalty in the training phase. They only work for in-distribution artifact types generated during training. When applied in real-world scenarios, we observe that those improved methods still generate obviously annoying artifacts during inference. In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. We then develop a novel method, namely, DeSRA, to Detect and then Delete those SR Artifacts in practice. Specifically, we propose to measure a relative local variance distance from MSE-SR results and GAN-SR results, and locate the problematic areas based on the above distance and semantic-aware thresholds. After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data. Equipped with our DeSRA, we can successfully eliminate artifacts from inference and improve the ability of SR models to be applied in real-world scenarios. The code will be available at https://github.com/TencentARC/DeSRA.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"580 1","pages":"38204-38226"},"PeriodicalIF":0.0,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77365873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts 付费随机专家在线分类的权衡报酬与准确性
Dirk van der Hoeven, Ciara Pike-Burke, Haotian Qiu, N. Cesa-Bianchi
We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz"productivity"function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $mathcal{O}(K^2(log T)sqrt{T})$ where $K$ is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order $T^{2/3}$ one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data
我们用付费的随机专家研究在线分类。在这里,在做出预测之前,每个专家都必须得到报酬。我们支付给每个专家的金额直接影响他们预测的准确性,通过一些未知的Lipschitz“生产力”函数。在每一轮中,学习者必须决定付给每个专家多少钱,然后做出预测。他们产生的成本等于所有专家的预测误差和预付费用的加权总和。我们引入了一种在线学习算法,其在$T$轮之后的总成本超过了预先知道所有专家生产率的预测器的总成本,最多为$mathcal{O}(K^2(log T)sqrt{T})$,其中$K$为专家的数量。为了达到这个结果,我们结合了Lipschitz匪徒和带代理损失的在线分类。这些工具使我们能够改进在标准Lipschitz土匪设置中获得的阶限$T^{2/3}$。我们的算法在合成数据上进行了经验评估
{"title":"Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts","authors":"Dirk van der Hoeven, Ciara Pike-Burke, Haotian Qiu, N. Cesa-Bianchi","doi":"10.48550/arXiv.2307.00836","DOIUrl":"https://doi.org/10.48550/arXiv.2307.00836","url":null,"abstract":"We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz\"productivity\"function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $mathcal{O}(K^2(log T)sqrt{T})$ where $K$ is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order $T^{2/3}$ one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"33 1","pages":"34809-34830"},"PeriodicalIF":0.0,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75201055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers 用一种新的双指数移动平均线双向观察自适应和非自适应动量优化器
Yineng Chen, Z. Li, Lefei Zhang, Bo Du, Hai Zhao
Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel textsc{Admeta} (textbf{A} textbf{D}ouble exponential textbf{M}oving averagtextbf{E} textbf{T}o textbf{A}daptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaches a set value, maintaining its speed at early stage and high convergence performance at final stage. Based on this idea, we provide two optimizer implementations, textsc{AdmetaR} and textsc{AdmetaS}, the former based on RAdam and the latter based on SGDM. Through extensive experiments on diverse tasks, we find that the proposed textsc{Admeta} optimizer outperforms our base optimizers and shows advantages over recently proposed competitive optimizers. We also provide theoretical proof of these two algorithms, which verifies the convergence of our proposed textsc{Admeta}.
优化器是深度学习成功的重要组成部分,它引导神经网络根据训练集上的损失来更新参数。SGD和Adam是两个经典而有效的优化器,研究人员提出了许多变体,例如SGDM和RAdam。在本文中,我们创新性地将优化器算法的后向和前瞻相结合,提出了一种新颖的textsc{附加} (textbf{a}textbf{双}指数textbf{移动}textbf{平均}textbf{到}textbf{自适应}和非自适应动量)优化器框架。对于回溯部分,我们提出了一种由股票市场指标激励的DEMA变体方案,以取代常见的指数移动平均方案。而在前向部分,我们提出了一种动态前向策略,该策略在前期保持速度,在后期保持高收敛性能。基于这个思想,我们提供了两种优化器实现,textsc{AdmetaR}和textsc{AdmetaS},前者基于RAdam,后者基于SGDM。通过对不同任务的广泛实验,我们发现提出的textsc{附加}优化器优于我们的基本优化器,并且比最近提出的竞争优化器显示出优势。我们还提供了这两种算法的理论证明,验证了我们提出的textsc{附加}的收敛性。
{"title":"Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers","authors":"Yineng Chen, Z. Li, Lefei Zhang, Bo Du, Hai Zhao","doi":"10.48550/arXiv.2307.00631","DOIUrl":"https://doi.org/10.48550/arXiv.2307.00631","url":null,"abstract":"Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel textsc{Admeta} (textbf{A} textbf{D}ouble exponential textbf{M}oving averagtextbf{E} textbf{T}o textbf{A}daptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaches a set value, maintaining its speed at early stage and high convergence performance at final stage. Based on this idea, we provide two optimizer implementations, textsc{AdmetaR} and textsc{AdmetaS}, the former based on RAdam and the latter based on SGDM. Through extensive experiments on diverse tasks, we find that the proposed textsc{Admeta} optimizer outperforms our base optimizers and shows advantages over recently proposed competitive optimizers. We also provide theoretical proof of these two algorithms, which verifies the convergence of our proposed textsc{Admeta}.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"14 1","pages":"4764-4803"},"PeriodicalIF":0.0,"publicationDate":"2023-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87958071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat. 将黑箱分割成可解释模型的混合物:路线、解释、重复。
Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich

ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively "harder" samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.

ML 模型设计要么以可解释模型为起点,要么以黑箱模型为起点,并对其进行事后解释。黑箱模型灵活但难以解释,而可解释模型本质上是可以解释的。然而,可解释模型需要大量的 ML 知识,其灵活性和性能往往不如黑盒模型。本文旨在模糊黑箱的事后解释与构建可解释模型之间的区别。从黑盒子开始,我们反复雕刻出可解释专家(MoIE)和残差网络的混合物。每个可解释模型专注于样本的子集,并使用一阶逻辑(FOL)对其进行解释,从而为黑盒中的概念提供基本推理。我们通过灵活的残差网络对剩余样本进行路由。我们在残差网络上重复该方法,直到所有可解释模型都能解释所需的数据比例。我们的大量实验表明,我们的路由、解释和重复方法:(1)通过 MoIE 识别出了一系列不同的特定实例概念,这些概念具有很高的概念完备性,同时又不影响性能;(2)通过残差识别出了相对 "较难 "解释的样本;(3)在测试时间干预期间,我们的性能明显优于可解释的设计模型;(4)修复了原始黑盒所学到的捷径。MoIE 的代码可在 https://github.com/batmanlab/ICML-2023-Route-interpret-repeat 公开获取。
{"title":"Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat.","authors":"Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively <i>carve out</i> a mixture of interpretable experts (MoIE) and a <i>residual network</i>. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our <i>route, interpret, and repeat</i> approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively \"harder\" samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.</p>","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"202 ","pages":"11360-11397"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500943/pdf/nihms-1915804.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10305812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat 划分和征服黑箱到可解释模型的混合:路线,解释,重复
Shantanu Ghosh, K. Yu, Forough Arabshahi, K. Batmanghelich
ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively "harder" samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.
机器学习模型设计要么从一个可解释的模型开始,要么从一个黑盒开始,并在事后解释它。黑盒模型是灵活的,但难以解释,而可解释模型本质上是可解释的。然而,可解释的模型需要广泛的ML知识,并且往往比它们的Blackbox变体更不灵活,表现不佳。本文旨在模糊黑箱的事后解释和构建可解释模型之间的区别。从黑盒开始,我们迭代地划分出可解释专家(MoIE)和残余网络的混合物。每个可解释的模型专门研究样本的子集,并使用一阶逻辑(FOL)解释它们,提供对Blackbox概念的基本推理。我们将剩余的样品通过柔性残差输送。我们在残差网络上重复该方法,直到所有可解释模型都能解释所需的数据比例。我们的大量实验表明,我们的路径、解释和重复方法(1)在不影响性能的情况下,通过MoIE识别出具有高概念完整性的不同实例特定概念集,(2)通过残差识别出相对“较难”解释的样本,(3)在测试时间干预期间显著优于设计可解释模型,(4)修复了原始Blackbox学习到的捷径。MoIE的代码可在https://github.com/batmanlab/ICML-2023-Route-interpret-repeat公开获取。
{"title":"Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat","authors":"Shantanu Ghosh, K. Yu, Forough Arabshahi, K. Batmanghelich","doi":"10.48550/arXiv.2307.05350","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05350","url":null,"abstract":"ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively \"harder\" samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"61 1","pages":"11360-11397"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78317504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Geometric Autoencoders - What You See is What You Decode 几何自动编码器-你看到的就是你解码的
Philipp Nazari, Sebastian Damrich, F. Hamprecht
Visualization is a crucial step in exploratory data analysis. One possible approach is to train an autoencoder with low-dimensional latent space. Large network depth and width can help unfolding the data. However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding's distortion, and second a new regularizer mitigating such distortion. Our ``Geometric Autoencoder'' avoids stretching the embedding spuriously, so that the visualization captures the data structure more faithfully. It also flags areas where little distortion could not be achieved, thus guarding against misinterpretation.
可视化是探索性数据分析的关键步骤。一种可能的方法是训练具有低维潜在空间的自编码器。大的网络深度和宽度可以帮助展开数据。然而,这种表达网络即使在潜在表征失真的情况下也能实现较低的重构误差。为了避免这种误导性的可视化,我们首先提出了解码器的微分几何视角,从而对嵌入的失真进行有见地的诊断,然后提出了一个新的正则化器来减轻这种失真。我们的“几何自动编码器”避免了虚假地拉伸嵌入,使可视化更忠实地捕获数据结构。它还标记了不可能实现少量扭曲的领域,从而防止误解。
{"title":"Geometric Autoencoders - What You See is What You Decode","authors":"Philipp Nazari, Sebastian Damrich, F. Hamprecht","doi":"10.48550/arXiv.2306.17638","DOIUrl":"https://doi.org/10.48550/arXiv.2306.17638","url":null,"abstract":"Visualization is a crucial step in exploratory data analysis. One possible approach is to train an autoencoder with low-dimensional latent space. Large network depth and width can help unfolding the data. However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding's distortion, and second a new regularizer mitigating such distortion. Our ``Geometric Autoencoder'' avoids stretching the embedding spuriously, so that the visualization captures the data structure more faithfully. It also flags areas where little distortion could not be achieved, thus guarding against misinterpretation.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"21 1","pages":"25834-25857"},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73313822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Neural Coding for Controllable CAD Model Generation 层次神经编码在可控CAD模型生成中的应用
Xiang Xu, P. Jayaraman, J. Lambourne, Karl D. D. Willis, Yasutaka Furukawa
This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. Concretely, a novel variant of a vector quantized VAE with"masked skip connection"extracts design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers learn to generate code trees from incomplete CAD models and then complete CAD models following the intended design. Extensive experiments demonstrate superior performance on conventional tasks such as random generation while enabling novel interaction capabilities on conditional generation tasks. The code is available at https://github.com/samxuxiang/hnc-cad.
本文提出了一种新的计算机辅助设计(CAD)生成模型:1)将CAD模型的高级设计概念表示为从全局零件排列到局部曲线几何的三层神经代码层次树;2)通过使用代码树指定目标设计来控制CAD模型的生成或完成。具体地说,一种带有“掩蔽跳跃连接”的矢量量化VAE的新变体在三个层次上提取设计变化作为神经码本。二级级联自回归变压器学习从不完整的CAD模型生成代码树,然后按照预期设计完成CAD模型。大量的实验表明,在随机生成等传统任务上具有优越的性能,同时在条件生成任务上具有新颖的交互能力。代码可在https://github.com/samxuxiang/hnc-cad上获得。
{"title":"Hierarchical Neural Coding for Controllable CAD Model Generation","authors":"Xiang Xu, P. Jayaraman, J. Lambourne, Karl D. D. Willis, Yasutaka Furukawa","doi":"10.48550/arXiv.2307.00149","DOIUrl":"https://doi.org/10.48550/arXiv.2307.00149","url":null,"abstract":"This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. Concretely, a novel variant of a vector quantized VAE with\"masked skip connection\"extracts design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers learn to generate code trees from incomplete CAD models and then complete CAD models following the intended design. Extensive experiments demonstrate superior performance on conventional tasks such as random generation while enabling novel interaction capabilities on conditional generation tasks. The code is available at https://github.com/samxuxiang/hnc-cad.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"78 1","pages":"38443-38461"},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77949689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations 神经元真的崩溃了吗?神经表征中的细粒度结构
Yongyi Yang, J. Steinhardt, Wei Hu
Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.
最近的研究在训练良好的神经网络中观察到一个有趣的“神经崩溃”现象,即具有相同标签的训练样本的最后一层表示相互崩溃。这似乎表明,最后一层表示完全由标签决定,而不依赖于输入分布的内在结构。我们提供的证据表明,这不是一个完整的描述,并且明显的崩溃隐藏了表征中重要的细粒度结构。具体来说,即使表示明显崩溃,剩余的少量变化仍然可以忠实而准确地捕获输入分布的内在结构。作为一个例子,如果我们在CIFAR-10上只使用5个粗粒度标签(通过将两个类组合成一个超类)进行训练,直到收敛,我们可以通过无监督聚类从学习到的表示中重建原始的10类标签。重建的标签在CIFAR-10测试集上达到了$93%$的准确率,几乎与相同架构的正常CIFAR-10准确率相匹配。我们还提供了一个初步的理论结果,显示了在简化的合成设置中的细粒度表示结构。我们的研究结果具体显示了输入数据的结构如何在确定神经表征的细粒度结构方面发挥重要作用,这超出了neural Collapse所预测的范围。
{"title":"Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations","authors":"Yongyi Yang, J. Steinhardt, Wei Hu","doi":"10.48550/arXiv.2306.17105","DOIUrl":"https://doi.org/10.48550/arXiv.2306.17105","url":null,"abstract":"Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"140 1","pages":"39453-39487"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77767147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DUET: 2D Structured and Approximately Equivariant Representations 二维结构和近似等变表示
Xavier Suau, Federico Danieli, Thomas Anderson Keller, Arno Blaas, Chen Huang, Jason Ramapuram, Dan Busbridge, L. Zappella
Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
多视图自监督学习(MSSL)是基于一组输入变换的不变性学习。然而,不变性从表示中部分或全部地删除了与转换相关的信息,这可能会损害需要此类信息的特定下游任务的性能。我们提出了二维结构化和等变表示(DUET),这是在矩阵结构中组织的二维表示,并且在作用于输入数据的转换方面是等变的。DUET表示维护关于输入转换的信息,同时保持语义表达。与SimCLR (Chen et al., 2020)(非结构化和不变)和ESSL (Dangovski et al., 2022)(非结构化和等变)相比,DUET表示的结构化和等变性质使其能够以较低的重构误差控制生成,而SimCLR或ESSL则不可能实现可控制性。DUET在一些判别性任务上也达到了更高的准确率,并改善了迁移学习。
{"title":"DUET: 2D Structured and Approximately Equivariant Representations","authors":"Xavier Suau, Federico Danieli, Thomas Anderson Keller, Arno Blaas, Chen Huang, Jason Ramapuram, Dan Busbridge, L. Zappella","doi":"10.48550/arXiv.2306.16058","DOIUrl":"https://doi.org/10.48550/arXiv.2306.16058","url":null,"abstract":"Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"21 1","pages":"32749-32769"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84980102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Curious Replay for Model-based Adaptation 基于模型的适应的好奇重放
Isaac Kauvar, Christopher Doyle, Linqi Zhou, N. Haber
Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay
代理必须能够快速适应环境的变化。我们发现现有的基于模型的强化学习代理无法很好地做到这一点,部分原因在于它们如何使用过去的经验来训练它们的世界模型。在这里,我们提出好奇重放——一种通过使用基于好奇心的优先级信号为基于模型的代理量身定制的优先级体验重放形式。使用Curious Replay的智能体在受动物行为启发的探索范式和Crafter基准中表现出更高的性能。带有Curious Replay的DreamerV3在craft上的表现超过了最先进的水平,达到了19.4分的平均分数,大大提高了具有统一Replay的DreamerV3之前的高分14.5分,同时在Deepmind Control Suite上也保持了类似的表现。代码好奇重放可在https://github.com/AutonomousAgentsLab/curiousreplay
{"title":"Curious Replay for Model-based Adaptation","authors":"Isaac Kauvar, Christopher Doyle, Linqi Zhou, N. Haber","doi":"10.48550/arXiv.2306.15934","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15934","url":null,"abstract":"Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"11 1","pages":"16018-16048"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75194404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1