Pub Date : 2024-12-09DOI: 10.1007/s12561-024-09470-5
Subharup Guha, Yi Li
Comparative meta-analyses of groups of subjects by integrating multiple observational studies rely on estimated propensity scores (PSs) to mitigate covariate imbalances. However, PS estimation grapples with the theoretical and practical challenges posed by high-dimensional covariates. Motivated by an integrative analysis of breast cancer patients across seven medical centers, this paper tackles the challenges of integrating multiple observational datasets. The proposed inferential technique, called Bayesian Motif Submatrices for Covariates (B-MSC), addresses the curse of dimensionality by a hybrid of Bayesian and frequentist approaches. B-MSC uses nonparametric Bayesian "Chinese restaurant" processes to eliminate redundancy in the high-dimensional covariates and discover latent motifs or lower-dimensional structures. With these motifs as potential predictors, standard regression techniques can be utilized to accurately infer the PSs and facilitate covariate-balanced group comparisons. Simulations and meta-analysis of the motivating cancer investigation demonstrate the efficacy of the B-MSC approach to accurately estimate the propensity scores and efficiently address covariate imbalance when integrating observational health studies with high-dimensional covariates.
{"title":"Bayesian Estimation of Propensity Scores for Integrating Multiple Cohorts with High-Dimensional Covariates.","authors":"Subharup Guha, Yi Li","doi":"10.1007/s12561-024-09470-5","DOIUrl":"10.1007/s12561-024-09470-5","url":null,"abstract":"<p><p>Comparative meta-analyses of groups of subjects by integrating multiple observational studies rely on estimated propensity scores (PSs) to mitigate covariate imbalances. However, PS estimation grapples with the theoretical and practical challenges posed by high-dimensional covariates. Motivated by an integrative analysis of breast cancer patients across seven medical centers, this paper tackles the challenges of integrating multiple observational datasets. The proposed inferential technique, called Bayesian Motif Submatrices for Covariates (B-MSC), addresses the curse of dimensionality by a hybrid of Bayesian and frequentist approaches. B-MSC uses nonparametric Bayesian \"Chinese restaurant\" processes to eliminate redundancy in the high-dimensional covariates and discover latent <i>motifs</i> or lower-dimensional structures. With these motifs as potential predictors, standard regression techniques can be utilized to accurately infer the PSs and facilitate covariate-balanced group comparisons. Simulations and meta-analysis of the motivating cancer investigation demonstrate the efficacy of the B-MSC approach to accurately estimate the propensity scores and efficiently address covariate imbalance when integrating observational health studies with high-dimensional covariates.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144973414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1007/s12561-024-09457-2
Zeyu Yang, Hua Liang, Huiling Liu, Shannon Barth, Morgan Byrne, Elisabeth Andersen, Vinay Bhandaru, Amanda Castel
We propose a projection-based test to check logistic regression models and apply the test to study telehealth utilization during the COVID-19 pandemic among patients with HIV. The test is shown to be consistent and can detect root- local alternatives. The use of the proposed test to investigate a COVID-19 dataset reveals that the probability of telehealth utilization depends on the following variables: overweight, education, and age and the interaction between age and ethnicity. Specifically, the probability for the Hispanic group decreases with older age, whereas there is no trend between the probability with the age for the group of non-Hispanic. This interaction may be ignored when we apply other goodness-of-fit methods. The simulation studies also show the performance of the proposed method is remarkably attractive compared to its competitors.
{"title":"Model Checking for Logistic Models with Study of Telehealth During the COVID-19 Pandemic Among PWH in DC.","authors":"Zeyu Yang, Hua Liang, Huiling Liu, Shannon Barth, Morgan Byrne, Elisabeth Andersen, Vinay Bhandaru, Amanda Castel","doi":"10.1007/s12561-024-09457-2","DOIUrl":"10.1007/s12561-024-09457-2","url":null,"abstract":"<p><p>We propose a projection-based test to check logistic regression models and apply the test to study telehealth utilization during the COVID-19 pandemic among patients with HIV. The test is shown to be consistent and can detect root- <math><mi>n</mi></math> local alternatives. The use of the proposed test to investigate a COVID-19 dataset reveals that the probability of telehealth utilization depends on the following variables: overweight, education, and age and the interaction between age and ethnicity. Specifically, the probability for the Hispanic group decreases with older age, whereas there is no trend between the probability with the age for the group of non-Hispanic. This interaction may be ignored when we apply other goodness-of-fit methods. The simulation studies also show the performance of the proposed method is remarkably attractive compared to its competitors.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12306521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144754775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1007/s12561-024-09449-2
Yuying Lu, Tian Gu, Rui Duan
Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of "gold standard" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions. In response to these challenges, we introduce FEderated Semi-Supervised Transfer Learning (FEST) for improving disease risk predictions in underrepresented populations. FEST facilitates the collaborative training of models across various institutions by leveraging both labeled and unlabeled data from diverse subpopulations. It addresses distributional variations across different populations and healthcare institutions by combining density ratio reweighting and model calibration techniques. Federated learning algorithms are developed for training models using only summary-level statistics. We perform simulation studies to assess the efficacy of FEST in comparisons with a few alternative methods. Subsequently, we apply FEST to training a genetic risk prediction model for type 2 diabetes that targets the African-Ancestry population using data from the Massachusetts General Brigham (MGB) Biobank. Both our computational experiments and real-world data application underline the superior performance of FEST over competing methods.
{"title":"Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.","authors":"Yuying Lu, Tian Gu, Rui Duan","doi":"10.1007/s12561-024-09449-2","DOIUrl":"10.1007/s12561-024-09449-2","url":null,"abstract":"<p><p>Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of \"gold standard\" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions. In response to these challenges, we introduce FEderated Semi-Supervised Transfer Learning (FEST) for improving disease risk predictions in underrepresented populations. FEST facilitates the collaborative training of models across various institutions by leveraging both labeled and unlabeled data from diverse subpopulations. It addresses distributional variations across different populations and healthcare institutions by combining density ratio reweighting and model calibration techniques. Federated learning algorithms are developed for training models using only summary-level statistics. We perform simulation studies to assess the efficacy of FEST in comparisons with a few alternative methods. Subsequently, we apply FEST to training a genetic risk prediction model for type 2 diabetes that targets the African-Ancestry population using data from the Massachusetts General Brigham (MGB) Biobank. Both our computational experiments and real-world data application underline the superior performance of FEST over competing methods.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12409711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1007/s12561-024-09452-7
Wenrui Li, Qiyiwen Zhang, Kewen Qu, Qi Long
There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that can incorporate knowledge of underlying networks have been shown to improve predication and classification accuracy, and yield more interpretable results. Moreover, these methods typically use graphs extracted from existing databases or rely on subject matter expertise which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian factor model that can account for network noise and identify globally shared, partially shared and modality-specific latent factors in multimodal data. Specifically, we use two sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed features in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian factor analysis model with shrinkage priors to encourage feature-wise and modal-wise sparsity, thereby allowing feature selection and identification of factors of each type. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of gene expression and metabolomics datasets for Alzheimer's disease.
{"title":"Graph-guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information.","authors":"Wenrui Li, Qiyiwen Zhang, Kewen Qu, Qi Long","doi":"10.1007/s12561-024-09452-7","DOIUrl":"10.1007/s12561-024-09452-7","url":null,"abstract":"<p><p>There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that can incorporate knowledge of underlying networks have been shown to improve predication and classification accuracy, and yield more interpretable results. Moreover, these methods typically use graphs extracted from existing databases or rely on subject matter expertise which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian factor model that can account for network noise and identify globally shared, partially shared and modality-specific latent factors in multimodal data. Specifically, we use two sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed features in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian factor analysis model with shrinkage priors to encourage feature-wise and modal-wise sparsity, thereby allowing feature selection and identification of factors of each type. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of gene expression and metabolomics datasets for Alzheimer's disease.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12221265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144691908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In biomedical studies, longitudinal processes are collected till time-to-event, sometimes on nested timescales (example, days within months). Most of the literature in joint modeling of longitudinal and time-to-event data has focused on modeling the mean or dispersion of the longitudinal process with the hazard for time-to-event. However, based on the motivating studies, it may be of interest to investigate how the cycle-level geometric features (such as the curvature, location and height of a peak), of a cyclical longitudinal process is associated with the time-to-event being studied. We propose a shared parameter joint model for a cyclical longitudinal process and a discrete survival time, measured on nested timescales, where the cycle-varying geometric feature is modeled through a linear mixed effects model and a proportional hazards model for the discrete survival time. The proposed approach allows for prediction of survival probabilities for future subjects based on their available longitudinal measurements. Our proposed model and approach is illustrated through simulation and analysis of Stress and Time-to-Pregnancy, a component of Oxford Conception Study. A joint modeling approach was used to assess whether the cycle-specific geometric features of the lutenizing hormone measurements, such as its peak or its curvature, are associated with time-to-pregnancy (TTP).
{"title":"Joint Modeling of Geometric Features of Longitudinal Process and Discrete Survival Time Measured on Nested Timescales: An Application to Fecundity Studies.","authors":"Abhisek Saha, Ling Ma, Animikh Biswas, Rajeshwari Sundaram","doi":"10.1007/s12561-023-09381-x","DOIUrl":"10.1007/s12561-023-09381-x","url":null,"abstract":"<p><p>In biomedical studies, longitudinal processes are collected till time-to-event, sometimes on nested timescales (example, days within months). Most of the literature in joint modeling of longitudinal and time-to-event data has focused on modeling the mean or dispersion of the longitudinal process with the hazard for time-to-event. However, based on the motivating studies, it may be of interest to investigate how the cycle-level <i>geometric features</i> (such as the curvature, location and height of a peak), of a cyclical longitudinal process is associated with the time-to-event being studied. We propose a shared parameter joint model for a cyclical longitudinal process and a discrete survival time, measured on nested timescales, where the cycle-varying geometric feature is modeled through a linear mixed effects model and a proportional hazards model for the discrete survival time. The proposed approach allows for prediction of survival probabilities for future subjects based on their available longitudinal measurements. Our proposed model and approach is illustrated through simulation and analysis of Stress and Time-to-Pregnancy, a component of Oxford Conception Study. A joint modeling approach was used to assess whether the cycle-specific geometric features of the lutenizing hormone measurements, such as its peak or its curvature, are associated with time-to-pregnancy (TTP).</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"1 1","pages":"86-106"},"PeriodicalIF":0.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687766/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46399657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1007/s12561-023-09411-8
Chenguang Zhang, Masayuki Nigo, Shivani Patel, Duo Yu, Edward Septimus, Hulin Wu
{"title":"Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example","authors":"Chenguang Zhang, Masayuki Nigo, Shivani Patel, Duo Yu, Edward Septimus, Hulin Wu","doi":"10.1007/s12561-023-09411-8","DOIUrl":"https://doi.org/10.1007/s12561-023-09411-8","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"112 45","pages":"1-30"},"PeriodicalIF":1.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139391299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1007/s12561-023-09404-7
Joseph Boyle, Mary H. Ward, Stella Koutros, M. Karagas, M. Schwenn, Alison T. Johnson, Debra T. Silverman, David C. Wheeler
{"title":"Modeling Historic Arsenic Exposures and Spatial Risk for Bladder Cancer","authors":"Joseph Boyle, Mary H. Ward, Stella Koutros, M. Karagas, M. Schwenn, Alison T. Johnson, Debra T. Silverman, David C. Wheeler","doi":"10.1007/s12561-023-09404-7","DOIUrl":"https://doi.org/10.1007/s12561-023-09404-7","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"5 11","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138966433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s12561-023-09409-2
Alexander P. Keil, Katie M. O’Brien
{"title":"Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures","authors":"Alexander P. Keil, Katie M. O’Brien","doi":"10.1007/s12561-023-09409-2","DOIUrl":"https://doi.org/10.1007/s12561-023-09409-2","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"6 13","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139006657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-02DOI: 10.1007/s12561-023-09402-9
Yao Li, Wei Xu
{"title":"Causal Mediation Tree Model for Feature Identification on High-Dimensional Mediators","authors":"Yao Li, Wei Xu","doi":"10.1007/s12561-023-09402-9","DOIUrl":"https://doi.org/10.1007/s12561-023-09402-9","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"79 8","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138606084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1007/s12561-023-09407-4
Trevor J. Thomson, X. Joan Hu, Bohdan Nosyk
{"title":"Evaluating Effects of Various Exposures on Mortality Risk of Opioid Use Disorders with Linked Administrative Databases","authors":"Trevor J. Thomson, X. Joan Hu, Bohdan Nosyk","doi":"10.1007/s12561-023-09407-4","DOIUrl":"https://doi.org/10.1007/s12561-023-09407-4","url":null,"abstract":"","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" 32","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138616782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}