{"title":"The spike‐and‐slab quantile LASSO for robust variable selection in cancer genomics studies","authors":"Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu","doi":"10.1002/sim.10196","DOIUrl":null,"url":null,"abstract":"Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy‐tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high‐dimensional genomics data, we propose the spike‐and‐slab quantile LASSO through a fully Bayesian spike‐and‐slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self‐adaptivity to the sparsity pattern from the spike‐and‐slab LASSO (Roc̆ková and George, <jats:italic>J Am Stat Associat</jats:italic>, 2018, 113(521): 431–444). Furthermore, the spike‐and‐slab quantile LASSO has a computational advantage to locate the posterior modes via soft‐thresholding rule guided Expectation‐Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy‐tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike‐and‐slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10196","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy‐tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high‐dimensional genomics data, we propose the spike‐and‐slab quantile LASSO through a fully Bayesian spike‐and‐slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self‐adaptivity to the sparsity pattern from the spike‐and‐slab LASSO (Roc̆ková and George, J Am Stat Associat, 2018, 113(521): 431–444). Furthermore, the spike‐and‐slab quantile LASSO has a computational advantage to locate the posterior modes via soft‐thresholding rule guided Expectation‐Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy‐tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike‐and‐slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.