Pub Date : 2023-12-01Epub Date: 2022-07-08DOI: 10.1007/s12561-022-09350-w
Fangting Zhou, Kejun He, James J Cai, Laurie A Davidson, Robert S Chapkin, Yang Ni
The advances of modern sequencing techniques have generated an unprecedented amount of multi-omics data which provide great opportunities to quantitatively explore functional genomes from different but complementary perspectives. However, distinct modalities/sequencing technologies generate diverse types of data which greatly complicate statistical modeling because uniquely optimized methods are required for handling each type of data. In this paper, we propose a unified framework for Bayesian nonparametric matrix factorization that infers overlapping bi-clusters for multi-omics data. The proposed method adaptively discretizes different types of observations into common latent states on which cluster structures are built hierarchically. The proposed Bayesian nonparametric method is able to automatically determine the number of clusters. We demonstrate the utility of the proposed method using simulation studies and applications to a single-cell RNA-sequencing dataset, a combination of single-cell RNA-sequencing and single-cell ATAC-sequencing dataset, a bulk RNA-sequencing dataset, and a DNA methylation dataset which reveal several interesting findings that are consistent with biological literature.
{"title":"A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization.","authors":"Fangting Zhou, Kejun He, James J Cai, Laurie A Davidson, Robert S Chapkin, Yang Ni","doi":"10.1007/s12561-022-09350-w","DOIUrl":"10.1007/s12561-022-09350-w","url":null,"abstract":"<p><p>The advances of modern sequencing techniques have generated an unprecedented amount of multi-omics data which provide great opportunities to quantitatively explore functional genomes from different but complementary perspectives. However, distinct modalities/sequencing technologies generate diverse types of data which greatly complicate statistical modeling because uniquely optimized methods are required for handling each type of data. In this paper, we propose a unified framework for Bayesian nonparametric matrix factorization that infers overlapping bi-clusters for multi-omics data. The proposed method adaptively discretizes different types of observations into common latent states on which cluster structures are built hierarchically. The proposed Bayesian nonparametric method is able to automatically determine the number of clusters. We demonstrate the utility of the proposed method using simulation studies and applications to a single-cell RNA-sequencing dataset, a combination of single-cell RNA-sequencing and single-cell ATAC-sequencing dataset, a bulk RNA-sequencing dataset, and a DNA methylation dataset which reveal several interesting findings that are consistent with biological literature.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10766378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49357369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We propose a functional linear model to predict a functional response using multiple functional and longitudinal predictors and to estimate the effect lags of predictors. The coefficient functions are written as the expansion of a basis system (e.g. functional principal components, splines), and the coefficients of the basis functions are estimated via optimizing a penalization criterion. Then effect lags are determined by simultaneously searching on a prior designed grid mesh based on minimization of a proposed prediction error criterion. Mathematical properties of the estimated regression functions and predicted responses are studied. The performance of the method is evaluated by extensive simulations and a real data analysis application on chronic obstructive pulmonary disease (COPD).
{"title":"On Estimation of the Effect Lag of Predictors and Prediction in a Functional Linear Model","authors":"Haiyan Liu, Georgios Aivaliotis, Vijay Kumar, Jeanine Houwing-Duistermaat","doi":"10.1007/s12561-023-09393-7","DOIUrl":"https://doi.org/10.1007/s12561-023-09393-7","url":null,"abstract":"Abstract We propose a functional linear model to predict a functional response using multiple functional and longitudinal predictors and to estimate the effect lags of predictors. The coefficient functions are written as the expansion of a basis system (e.g. functional principal components, splines), and the coefficients of the basis functions are estimated via optimizing a penalization criterion. Then effect lags are determined by simultaneously searching on a prior designed grid mesh based on minimization of a proposed prediction error criterion. Mathematical properties of the estimated regression functions and predicted responses are studied. The performance of the method is evaluated by extensive simulations and a real data analysis application on chronic obstructive pulmonary disease (COPD).","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135341192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-04DOI: 10.1007/s12561-023-09395-5
Jiyeon Song, Subharup Guha, Yi Li
{"title":"Bayesian Inference for High Dimensional Cox Models with Gaussian and Diffused-Gamma Priors: A Case Study of Mortality in COVID-19 Patients Admitted to the ICU","authors":"Jiyeon Song, Subharup Guha, Yi Li","doi":"10.1007/s12561-023-09395-5","DOIUrl":"https://doi.org/10.1007/s12561-023-09395-5","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135774370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-04DOI: 10.1007/s12561-023-09398-2
Phillip Shreeves, Jeffrey L. Andrews, Xinchen Deng, Ramie Ali-Adeeb, Andrew Jirasek
{"title":"Nonnegative Matrix Factorization with Group and Basis Restrictions","authors":"Phillip Shreeves, Jeffrey L. Andrews, Xinchen Deng, Ramie Ali-Adeeb, Andrew Jirasek","doi":"10.1007/s12561-023-09398-2","DOIUrl":"https://doi.org/10.1007/s12561-023-09398-2","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135726760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1007/s12561-023-09401-w
Yingying Wei
{"title":"Introduction to Special Issue on Machine Learning Algorithms in Genomics and Genetics","authors":"Yingying Wei","doi":"10.1007/s12561-023-09401-w","DOIUrl":"https://doi.org/10.1007/s12561-023-09401-w","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-27DOI: 10.1007/s12561-023-09396-4
Jonathan Rathjens, Arthur Kolbe, Jürgen Hölzer, Katja Ickstadt, Nadja Klein
Abstract We analyze perinatal data including biometric and obstetric information as well as data on maternal smoking, among others. Birth weight is the primarily interesting response variable. Gestational age is usually an important covariate and included in polynomial form. However, in opposition to this univariate regression, bivariate modeling of birth weight and gestational age is recommended to distinguish effects on each, on both, and between them. Rather than a parametric bivariate distribution, we apply conditional copula regression, where the marginal distributions of birth weight and gestational age (not necessarily of the same form) and the dependence structure are modeled conditionally on covariates. In the resulting distributional regression model, all parameters of the two marginals and the copula parameter are observation specific. While the Gaussian distribution is suitable for birth weight, the skewed gestational age data are better modeled by the three-parameter Dagum distribution. The Clayton copula performs better than the Gumbel and the symmetric Gaussian copula, indicating lower tail dependence (stronger dependence when both variables are low), although this non-linear dependence between birth weight and gestational age is surprisingly weak and only influenced by Cesarean section. A non-linear trend of birth weight on gestational age is detected by a univariate model that is polynomial with respect to the effect of gestational age. Covariate effects on the expected birth weight are similar in our copula regression model and a univariate regression model, while distributional copula regression reveals further insights, such as effects of covariates on the association between birth weight and gestational age.
{"title":"Bivariate Analysis of Birth Weight and Gestational Age by Bayesian Distributional Regression with Copulas","authors":"Jonathan Rathjens, Arthur Kolbe, Jürgen Hölzer, Katja Ickstadt, Nadja Klein","doi":"10.1007/s12561-023-09396-4","DOIUrl":"https://doi.org/10.1007/s12561-023-09396-4","url":null,"abstract":"Abstract We analyze perinatal data including biometric and obstetric information as well as data on maternal smoking, among others. Birth weight is the primarily interesting response variable. Gestational age is usually an important covariate and included in polynomial form. However, in opposition to this univariate regression, bivariate modeling of birth weight and gestational age is recommended to distinguish effects on each, on both, and between them. Rather than a parametric bivariate distribution, we apply conditional copula regression, where the marginal distributions of birth weight and gestational age (not necessarily of the same form) and the dependence structure are modeled conditionally on covariates. In the resulting distributional regression model, all parameters of the two marginals and the copula parameter are observation specific. While the Gaussian distribution is suitable for birth weight, the skewed gestational age data are better modeled by the three-parameter Dagum distribution. The Clayton copula performs better than the Gumbel and the symmetric Gaussian copula, indicating lower tail dependence (stronger dependence when both variables are low), although this non-linear dependence between birth weight and gestational age is surprisingly weak and only influenced by Cesarean section. A non-linear trend of birth weight on gestational age is detected by a univariate model that is polynomial with respect to the effect of gestational age. Covariate effects on the expected birth weight are similar in our copula regression model and a univariate regression model, while distributional copula regression reveals further insights, such as effects of covariates on the association between birth weight and gestational age.","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136264066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1007/s12561-023-09400-x
Qiwei Li
{"title":"AI-Powered Bayesian Statistics in Biomedicine","authors":"Qiwei Li","doi":"10.1007/s12561-023-09400-x","DOIUrl":"https://doi.org/10.1007/s12561-023-09400-x","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134909679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-25DOI: 10.1007/s12561-023-09392-8
Fei Gao, Kwun Chuen Gary Chan
{"title":"Efficient Estimation of Semiparametric Transformation Model with Interval-Censored Data in Two-Phase Cohort Studies","authors":"Fei Gao, Kwun Chuen Gary Chan","doi":"10.1007/s12561-023-09392-8","DOIUrl":"https://doi.org/10.1007/s12561-023-09392-8","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135168437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1007/s12561-023-09391-9
Tian Tian, Jianguo Sun
{"title":"Variable Selection for Nonlinear Covariate Effects with Interval-Censored Failure Time Data","authors":"Tian Tian, Jianguo Sun","doi":"10.1007/s12561-023-09391-9","DOIUrl":"https://doi.org/10.1007/s12561-023-09391-9","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135617332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.1007/s12561-023-09390-w
Xiaowu Dai, Saad Mouti, Marjorie Lima do Vale, Sumantra Ray, Jeffrey Bohn, Lisa Goldberg
Abstract Two-point time-series data, characterized by baseline and follow-up observations, are frequently encountered in health research. We study a novel two-point time-series structure without a control group, which is driven by an observational routine clinical dataset collected to monitor key risk markers of type-2 diabetes (T2D) and cardiovascular disease (CVD). We propose a resampling approach called “I-Rand” for independently sampling one of the two-time points for each individual and making inferences on the estimated causal effects based on matching methods. The proposed method is illustrated with data from a service-based dietary intervention to promote a low-carbohydrate diet (LCD), designed to impact risk of T2D and CVD. Baseline data contain a pre-intervention health record of study participants, and health data after LCD intervention are recorded at the follow-up visit, providing a two-point time-series pattern without a parallel control group. Using this approach we find that obesity is a significant risk factor of T2D and CVD, and an LCD approach can significantly mitigate the risks of T2D and CVD. We provide code that implements our method.
{"title":"A Resampling Approach for Causal Inference on Novel Two-Point Time-Series with Application to Identify Risk Factors for Type-2 Diabetes and Cardiovascular Disease","authors":"Xiaowu Dai, Saad Mouti, Marjorie Lima do Vale, Sumantra Ray, Jeffrey Bohn, Lisa Goldberg","doi":"10.1007/s12561-023-09390-w","DOIUrl":"https://doi.org/10.1007/s12561-023-09390-w","url":null,"abstract":"Abstract Two-point time-series data, characterized by baseline and follow-up observations, are frequently encountered in health research. We study a novel two-point time-series structure without a control group, which is driven by an observational routine clinical dataset collected to monitor key risk markers of type-2 diabetes (T2D) and cardiovascular disease (CVD). We propose a resampling approach called “I-Rand” for independently sampling one of the two-time points for each individual and making inferences on the estimated causal effects based on matching methods. The proposed method is illustrated with data from a service-based dietary intervention to promote a low-carbohydrate diet (LCD), designed to impact risk of T2D and CVD. Baseline data contain a pre-intervention health record of study participants, and health data after LCD intervention are recorded at the follow-up visit, providing a two-point time-series pattern without a parallel control group. Using this approach we find that obesity is a significant risk factor of T2D and CVD, and an LCD approach can significantly mitigate the risks of T2D and CVD. We provide code that implements our method.","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136113434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}