DCID: Deep Canonical Information Decomposition

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2023-06-27 DOI:10.48550/arXiv.2306.15619

Alexander Rakowski, C. Lippert

{"title":"DCID: Deep Canonical Information Decomposition","authors":"Alexander Rakowski, C. Lippert","doi":"10.48550/arXiv.2306.15619","DOIUrl":null,"url":null,"abstract":"We consider the problem of identifying the signal shared between two one-dimensional target variables, in the presence of additional multivariate observations. Canonical Correlation Analysis (CCA)-based methods have traditionally been used to identify shared variables, however, they were designed for multivariate targets and only offer trivial solutions for univariate cases. In the context of Multi-Task Learning (MTL), various models were postulated to learn features that are sparse and shared across multiple tasks. However, these methods were typically evaluated by their predictive performance. To the best of our knowledge, no prior studies systematically evaluated models in terms of correctly recovering the shared signal. Here, we formalize the setting of univariate shared information retrieval, and propose ICM, an evaluation metric which can be used in the presence of ground-truth labels, quantifying 3 aspects of the learned shared features. We further propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables. We benchmark the models on a range of scenarios on synthetic data with known ground-truths and observe DCID outperforming the baselines in a wide range of settings. Finally, we demonstrate a real-life application of DCID on brain Magnetic Resonance Imaging (MRI) data, where we are able to extract more accurate predictors of changes in brain regions and obesity. The code for our experiments as well as the supplementary materials are available at https://github.com/alexrakowski/dcid","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.15619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We consider the problem of identifying the signal shared between two one-dimensional target variables, in the presence of additional multivariate observations. Canonical Correlation Analysis (CCA)-based methods have traditionally been used to identify shared variables, however, they were designed for multivariate targets and only offer trivial solutions for univariate cases. In the context of Multi-Task Learning (MTL), various models were postulated to learn features that are sparse and shared across multiple tasks. However, these methods were typically evaluated by their predictive performance. To the best of our knowledge, no prior studies systematically evaluated models in terms of correctly recovering the shared signal. Here, we formalize the setting of univariate shared information retrieval, and propose ICM, an evaluation metric which can be used in the presence of ground-truth labels, quantifying 3 aspects of the learned shared features. We further propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables. We benchmark the models on a range of scenarios on synthetic data with known ground-truths and observe DCID outperforming the baselines in a wide range of settings. Finally, we demonstrate a real-life application of DCID on brain Magnetic Resonance Imaging (MRI) data, where we are able to extract more accurate predictors of changes in brain regions and obesity. The code for our experiments as well as the supplementary materials are available at https://github.com/alexrakowski/dcid

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DCID:深度规范信息分解

我们考虑在存在额外的多变量观测值的情况下，识别两个一维目标变量之间共享的信号的问题。基于典型相关分析(CCA)的方法传统上用于识别共享变量，然而，它们是为多变量目标设计的，只能为单变量情况提供平凡的解决方案。在多任务学习(MTL)的背景下，假设了各种模型来学习稀疏且跨多个任务共享的特征。然而，这些方法通常是通过其预测性能来评估的。据我们所知，之前没有研究系统地评估了正确恢复共享信号的模型。在这里，我们形式化了单变量共享信息检索的设置，并提出了ICM，一种可以在存在真值标签的情况下使用的评估度量，量化了学习到的共享特征的三个方面。我们进一步提出了深度规范信息分解(DCID)——一种简单而有效的学习共享变量的方法。我们在已知的基本事实的合成数据的一系列场景中对模型进行基准测试，并观察到DCID在广泛的设置中优于基线。最后，我们展示了DCID在脑磁共振成像(MRI)数据上的实际应用，我们能够提取出更准确的脑区域变化和肥胖预测因子。我们的实验代码以及补充材料可以在https://github.com/alexrakowski/dcid上找到

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

自引率

0.00%

发文量