Yunlong Nie, Eugene Opoku, Laila Yasmin, Yin Song, Jie Wang, Sidi Wu, Vanessa Scarapicchia, Jodie Gawryluk, Liangliang Wang, Jiguo Cao, Farouk S Nathoo
We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. A Dynamic Causal Model (DCM) is fit to the rs-fMRI scans to estimate effective brain connectivity within the DMN and related to a set of single nucleotide polymorphisms (SNPs) contained in an empirical disease-constrained set which is obtained out-of-sample from 663 ADNI subjects having only genome-wide data. We relate longitudinal effective brain connectivity estimated using spectral DCM to SNPs using both linear mixed effect (LME) models as well as function-on-scalar regression (FSR). In both cases we implement a parametric bootstrap for testing SNP coefficients and make comparisons with p-values obtained from asymptotic null distributions. In both networks at an initial q-value threshold of 0.1 no effects are found. We report on exploratory patterns of associations with relatively high ranks that exhibit stability to the differing assumptions made by both FSR and LME.
{"title":"Spectral dynamic causal modelling of resting-state fMRI: an exploratory study relating effective brain connectivity in the default mode network to genetics.","authors":"Yunlong Nie, Eugene Opoku, Laila Yasmin, Yin Song, Jie Wang, Sidi Wu, Vanessa Scarapicchia, Jodie Gawryluk, Liangliang Wang, Jiguo Cao, Farouk S Nathoo","doi":"10.1515/sagmb-2019-0058","DOIUrl":"https://doi.org/10.1515/sagmb-2019-0058","url":null,"abstract":"<p><p>We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. A Dynamic Causal Model (DCM) is fit to the rs-fMRI scans to estimate effective brain connectivity within the DMN and related to a set of single nucleotide polymorphisms (SNPs) contained in an empirical disease-constrained set which is obtained out-of-sample from 663 ADNI subjects having only genome-wide data. We relate longitudinal effective brain connectivity estimated using spectral DCM to SNPs using both linear mixed effect (LME) models as well as function-on-scalar regression (FSR). In both cases we implement a parametric bootstrap for testing SNP coefficients and make comparisons with p-values obtained from asymptotic null distributions. In both networks at an initial q-value threshold of 0.1 no effects are found. We report on exploratory patterns of associations with relatively high ranks that exhibit stability to the differing assumptions made by both FSR and LME.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"19 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38327608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.
{"title":"EBADIMEX: an empirical Bayes approach to detect joint differential expression and methylation and to classify samples.","authors":"Tobias Madsen,Michał Świtnicki,Malene Juul,Jakob Skou Pedersen","doi":"10.1515/sagmb-2018-0050","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0050","url":null,"abstract":"DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"4 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2019-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138528254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genomic imprinting is a parent-of-origin effect apparent in an appreciable number of human diseases. We have proposed the new imprinting test statistic MOBIT, which is based on MOD score analysis. We were interested in the properties of the MOBIT concerning its distribution under three hypotheses: (1) H0,a: no linkage, no imprinting; (2) H0,b: linkage, no imprinting; (3) H1: linkage and imprinting. More specifically, we assessed the confounding between imprinting and sex-specific recombination frequencies, which presents a major difficulty in linkage-based testing for imprinting, and evaluated the power of the test. To this end, we have performed a linkage simulation study of affected sib-pairs and a three-generation pedigree with two trait models, many two- and multipoint marker scenarios, three genetic map ratios, two sample sizes, and five imprinting degrees. We also investigated the ability of the MOBIT to quantify the degree of imprinting and applied the MOBIT using a real data example on house dust mite allergy. We further proposed and evaluated two approaches to obtain empiric p values for the MOBIT. Our results showed that twopoint analyses assuming a sex-averaged marker map led to an inflated type I error due to confounding, especially for a larger marker-trait locus distance. When the correct sex-specific marker map was assumed, twopoint analyses have a reduced power to detect imprinting, compared to sex-averaged analyses with an appropriate correction for the inflation of the test statistic. However, confounding was not an issue in multipoint analysis unless the map ratio was extreme and marker spacing was sparse. With multipoint analysis, power as well as the ability to quantify the imprinting degree were almost equally high when a sex-averaged or the correct sex-specific map was used in the analysis. We recommend to obtain empiric p values for the MOBIT using genotype simulations based on the best-fitting nonimprinting model of the real dataset analysis. In addition, an implementation of a method based on the permutation of parental sexes is also available. In summary, we propose to perform multipoint analyses using densely spaced markers to efficiently discover new imprinted loci and to reliably quantify the degree of imprinting.
{"title":"Properties and Evaluation of the MOBIT - a novel Linkage-based Test Statistic and Quantification Method for Imprinting.","authors":"Markus Brugger, Michael Knapp, Konstantin Strauch","doi":"10.1515/sagmb-2018-0025","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0025","url":null,"abstract":"<p><p>Genomic imprinting is a parent-of-origin effect apparent in an appreciable number of human diseases. We have proposed the new imprinting test statistic MOBIT, which is based on MOD score analysis. We were interested in the properties of the MOBIT concerning its distribution under three hypotheses: (1) H0,a: no linkage, no imprinting; (2) H0,b: linkage, no imprinting; (3) H1: linkage and imprinting. More specifically, we assessed the confounding between imprinting and sex-specific recombination frequencies, which presents a major difficulty in linkage-based testing for imprinting, and evaluated the power of the test. To this end, we have performed a linkage simulation study of affected sib-pairs and a three-generation pedigree with two trait models, many two- and multipoint marker scenarios, three genetic map ratios, two sample sizes, and five imprinting degrees. We also investigated the ability of the MOBIT to quantify the degree of imprinting and applied the MOBIT using a real data example on house dust mite allergy. We further proposed and evaluated two approaches to obtain empiric p values for the MOBIT. Our results showed that twopoint analyses assuming a sex-averaged marker map led to an inflated type I error due to confounding, especially for a larger marker-trait locus distance. When the correct sex-specific marker map was assumed, twopoint analyses have a reduced power to detect imprinting, compared to sex-averaged analyses with an appropriate correction for the inflation of the test statistic. However, confounding was not an issue in multipoint analysis unless the map ratio was extreme and marker spacing was sparse. With multipoint analysis, power as well as the ability to quantify the imprinting degree were almost equally high when a sex-averaged or the correct sex-specific map was used in the analysis. We recommend to obtain empiric p values for the MOBIT using genotype simulations based on the best-fitting nonimprinting model of the real dataset analysis. In addition, an implementation of a method based on the permutation of parental sexes is also available. In summary, we propose to perform multipoint analyses using densely spaced markers to efficiently discover new imprinted loci and to reliably quantify the degree of imprinting.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"18 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38436842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian M Page, Linda Vos, Trine B Rounge, Hanne F Harbo, Bettina K Andreassen
DNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.
{"title":"Assessing genome-wide significance for the detection of differentially methylated regions.","authors":"Christian M Page, Linda Vos, Trine B Rounge, Hanne F Harbo, Bettina K Andreassen","doi":"10.1515/sagmb-2017-0050","DOIUrl":"https://doi.org/10.1515/sagmb-2017-0050","url":null,"abstract":"<p><p>DNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36502575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).
{"title":"A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.","authors":"Marie Perrot-Dockès, Céline Lévy-Leduc, Julien Chiquet, Laure Sansonnet, Margaux Brégère, Marie-Pierre Étienne, Stéphane Robin, Grégory Genta-Jouve","doi":"10.1515/sagmb-2017-0077","DOIUrl":"https://doi.org/10.1515/sagmb-2017-0077","url":null,"abstract":"<p><p>Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0077","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36483644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biology challenging statistics.","authors":"Michael P H Stumpf","doi":"10.1515/sagmb-2018-0048","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0048","url":null,"abstract":"","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial change at Statistical Applications in Genetics and Molecular Biology.","authors":"Torsten Krüger","doi":"10.1515/sagmb-2018-0046","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0046","url":null,"abstract":"","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36426801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nimisha Chaturvedi, Renée X de Menezes, Jelle J Goeman, Wessel van Wieringen
Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, named dNET. Our method uses ridge regression to model the network topology involving one gene's expression level, its gene dosage and the expression levels of other genes in the network. The interaction parameters are estimated by fitting the model per gene for all samples together. However, instead of testing for differential network topology per gene, dNET tests for an overall difference in estimated parameters between two groups of samples and produces a single p-value. With the help of several simulation studies, we show that dNET can detect differential network nodes with high accuracy and low rate of false positives even in the presence of differential cis effects. We also apply dNET to publicly available TCGA cancer datasets and identify pathways where copy number mediated gene-gene interactions differ between samples with cancer stage lower than stage 3 and samples with cancer stage 3 or above.
{"title":"A test for detecting differential indirect trans effects between two groups of samples.","authors":"Nimisha Chaturvedi, Renée X de Menezes, Jelle J Goeman, Wessel van Wieringen","doi":"10.1515/sagmb-2017-0058","DOIUrl":"https://doi.org/10.1515/sagmb-2017-0058","url":null,"abstract":"<p><p>Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, named dNET. Our method uses ridge regression to model the network topology involving one gene's expression level, its gene dosage and the expression levels of other genes in the network. The interaction parameters are estimated by fitting the model per gene for all samples together. However, instead of testing for differential network topology per gene, dNET tests for an overall difference in estimated parameters between two groups of samples and produces a single p-value. With the help of several simulation studies, we show that dNET can detect differential network nodes with high accuracy and low rate of false positives even in the presence of differential cis effects. We also apply dNET to publicly available TCGA cancer datasets and identify pathways where copy number mediated gene-gene interactions differ between samples with cancer stage lower than stage 3 and samples with cancer stage 3 or above.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36356623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, several previous works presented a Bayesian model which led to the following conclusion: thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions including asymptotic independence of sample correlations. The current work makes two main contributions: (1) It shows that while the assumptions of the Bayesian model discussed by previous papers seem to be non-restrictive, they are quite strong. To demonstrate this point, it is shown that some standard sparse and Gaussian models are not included in the set of models which are mathematically consistent with these assumptions. (2) It is shown that the empirical Bayes methodology which was applied in order to test the relevant assumptions does not detect severe violations and consequently an overestimation of the required sample size might be incurred. Finally, we suggest that under some regularity conditions it is possible that the current theoretical results can be used for development of a new method to test the asymptotic independence assumption.
{"title":"On the relation between the true and sample correlations under Bayesian modelling of gene expression datasets.","authors":"Royi Jacobovic","doi":"10.1515/sagmb-2017-0068","DOIUrl":"https://doi.org/10.1515/sagmb-2017-0068","url":null,"abstract":"Abstract The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, several previous works presented a Bayesian model which led to the following conclusion: thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions including asymptotic independence of sample correlations. The current work makes two main contributions: (1) It shows that while the assumptions of the Bayesian model discussed by previous papers seem to be non-restrictive, they are quite strong. To demonstrate this point, it is shown that some standard sparse and Gaussian models are not included in the set of models which are mathematically consistent with these assumptions. (2) It is shown that the empirical Bayes methodology which was applied in order to test the relevant assumptions does not detect severe violations and consequently an overestimation of the required sample size might be incurred. Finally, we suggest that under some regularity conditions it is possible that the current theoretical results can be used for development of a new method to test the asymptotic independence assumption.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36213534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naveen K Bansal, Mehdi Maadooliat, Steven J Schrodi
Abstract We consider a multiple hypotheses problem with directional alternatives in a decision theoretic framework. We obtain an empirical Bayes rule subject to a constraint on mixed directional false discovery rate (mdFDR≤α) under the semiparametric setting where the distribution of the test statistic is parametric, but the prior distribution is nonparametric. We proposed separate priors for the left tail and right tail alternatives as it may be required for many applications. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli and Efron. We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data, and a quantitative trait genome-wide SNP data set. We have developed a user-friendly web-based shiny App for the proposed method which is available through URL https://npseb.shinyapps.io/npseb/. The HIV and SNP data can be directly accessed, and the results presented in this paper can be executed.
{"title":"Empirical Bayesian approach to testing multiple hypotheses with separate priors for left and right alternatives.","authors":"Naveen K Bansal, Mehdi Maadooliat, Steven J Schrodi","doi":"10.1515/sagmb-2018-0002","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0002","url":null,"abstract":"Abstract We consider a multiple hypotheses problem with directional alternatives in a decision theoretic framework. We obtain an empirical Bayes rule subject to a constraint on mixed directional false discovery rate (mdFDR≤α) under the semiparametric setting where the distribution of the test statistic is parametric, but the prior distribution is nonparametric. We proposed separate priors for the left tail and right tail alternatives as it may be required for many applications. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli and Efron. We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data, and a quantitative trait genome-wide SNP data set. We have developed a user-friendly web-based shiny App for the proposed method which is available through URL https://npseb.shinyapps.io/npseb/. The HIV and SNP data can be directly accessed, and the results presented in this paper can be executed.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36286389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}