Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Journal of Machine Learning Research Pub Date : 2016-01-01

Lu Tang, Peter X K Song

{"title":"Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.","authors":"Lu Tang, Peter X K Song","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of study population, study design, or study coordination. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing a meaningful interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using the fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power. We conduct extensive simulation studies and provide an application example to demonstrate the performance of the new method with a comparison to the conventional methods.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647925/pdf/nihms872528.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of study population, study design, or study coordination. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing a meaningful interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using the fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power. We conduct extensive simulation studies and provide an application example to demonstrate the performance of the new method with a comparison to the conventional methods.

Abstract Image

微信好友朋友圈 QQ好友复制链接

本刊更多论文

回归系数聚类的融合Lasso方法——数据集成中参数异质性的学习。

随着相关研究的数据集越来越容易获取，在实践中往往会将类似研究的数据集进行组合，以获得更大的样本量和更高的功率。数据整合带来的主要挑战涉及研究人群、研究设计或研究协调方面的数据异质性。在数据分析中忽略这种异质性可能会导致有偏差的估计和误导性的推断。补救数据异质性的传统技术包括使用相互作用和随机效应，它们不如达到理想的统计能力或提供有意义的解释，特别是当大量较小的数据集组合在一起时。在本文中，我们提出了一种正则化融合方法，使我们能够在回归分析中识别和合并研究间的同质参数簇，而无需使用假设检验方法。利用融合套索，我们建立了一个计算效率高的处理大规模集成数据的程序。在融合套索中加入估计的参数排序，在不损失统计能力的情况下提高了计算速度。我们进行了大量的仿真研究，并提供了一个应用实例来证明新方法的性能，并与传统方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.

期刊最新文献

Flexible Bayesian Product Mixture Models for Vector Autoregressions. Convergence for nonconvex ADMM, with applications to CT imaging. Effect-Invariant Mechanisms for Policy Generalization. Nonparametric Regression for 3D Point Cloud Learning. Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data.