Pub Date : 2024-07-16DOI: 10.1080/26941899.2024.2376535
Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak
{"title":"Biclustering Multivariate Longitudinal Data with Application to Recovery Trajectories of White Matter After Sport-Related Concussion","authors":"Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak","doi":"10.1080/26941899.2024.2376535","DOIUrl":"https://doi.org/10.1080/26941899.2024.2376535","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141832378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-06-16DOI: 10.1080/26941899.2024.2360892
Ruiyang Li, Xi Zhu, Seonjoo Lee
In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators and exposure-by-mediator ( -by- ) interactions. Although several high-dimensional mediation methods can naturally handle -by- interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select and -by- interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.
在中介分析中,暴露往往会影响中介效应,即暴露与中介对因变量之间存在相互作用。当中介因子是高维的时候,有必要识别非零中介因子 M 和暴露-中介因子(X-by- M)的交互作用。虽然有几种高维中介方法可以自然地处理 X -by- M 交互作用,但在保留主效应和交互作用之间的潜在层次结构方面的研究却很少。为了填补这一知识空白,我们开发了 XMInt 程序,用于在高维中介设置中选择 M 和 X -by- M 交互作用,同时保留层次结构。我们提出的方法采用了一种基于序列正则化的前向选择方法来识别介质及其与暴露的分层交互作用。我们的数值实验显示了良好的选择结果。此外,我们还将我们的方法应用于 ADNI 形态学数据,研究了皮层厚度和皮层下体积对淀粉样蛋白-β累积对认知能力影响的作用,这有助于理解大脑补偿机制。
{"title":"Model Selection for Exposure-Mediator Interaction.","authors":"Ruiyang Li, Xi Zhu, Seonjoo Lee","doi":"10.1080/26941899.2024.2360892","DOIUrl":"10.1080/26941899.2024.2360892","url":null,"abstract":"<p><p>In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators <math> <mrow><mfenced><mi>M</mi></mfenced> </mrow> </math> and exposure-by-mediator ( <math><mi>X</mi></math> -by- <math><mi>M</mi></math> ) interactions. Although several high-dimensional mediation methods can naturally handle <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select <math><mi>M</mi></math> and <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11210705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-03-06DOI: 10.1080/26941899.2024.2309403
Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk
There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.
除尸检外,阿尔茨海默病(AD)的诊断没有金标准,这就促使人们使用无监督学习法。混合回归是一种无监督方法,可以同时从多个生物标记物中识别群组,同时学习群组内的人口统计学效应。注意力缺失症的脑脊液(CSF)生物标记物具有检测极限,这带来了额外的挑战。我们对来自埃默里戈伊苏埃塔阿尔茨海默病研究中心(Emory Goizueta Alzheimer's Disease Research Center)和埃默里健康脑研究(Emory Healthy Brain Study)的 3000 多名参与者应用了多变量截断高斯分布混合回归法(也称为删减多变量高斯混合回归法或多变量托比特混合回归法),检测脑脊液中已知检测限的淀粉样β肽 1-42 (Abeta42)、总 tau 蛋白和磷酸化 tau 蛋白。我们填补了关于截断多元高斯分布混合回归的文献中的三个空白:软件可用性、推论和聚类准确性。我们发现了三个趋向于与注意力缺失症群体、正常对照组和非注意力缺失症病理特征相一致的聚类。CSF特征因种族、性别和遗传标记ApoE4的不同而不同,这突出了在有检测限的无监督学习中考虑人口因素的重要性。值得注意的是,类似 AD 组的非裔美国人的 tau 负担明显较低。
{"title":"Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits.","authors":"Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk","doi":"10.1080/26941899.2024.2309403","DOIUrl":"https://doi.org/10.1080/26941899.2024.2309403","url":null,"abstract":"<p><p>There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.
{"title":"Rewiring Dynamics of Functional Connectomes during Motor-Skill Learning","authors":"Saber Meamardoost, Mahasweta Bhattacharya, Eun Jung Hwang, Chi Ren, Linbing Wang, Claudia Mewes, Ying Zhang, Takaki Komiyama, Rudiyanto Gunawan","doi":"10.1080/26941899.2023.2260431","DOIUrl":"https://doi.org/10.1080/26941899.2023.2260431","url":null,"abstract":"The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134975618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-21DOI: 10.1080/26941899.2023.2219707
Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
{"title":"An Empirical Bayes Approach to Estimating Dynamic Models of Co-Regulated Gene Expression","authors":"Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells","doi":"10.1080/26941899.2023.2219707","DOIUrl":"https://doi.org/10.1080/26941899.2023.2219707","url":null,"abstract":"Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135820909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-19DOI: 10.1080/26941899.2023.2231061
Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham
{"title":"Evaluation of Seasonality in Sea Surface Salinity Balance Equation via Function Registration","authors":"Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham","doi":"10.1080/26941899.2023.2231061","DOIUrl":"https://doi.org/10.1080/26941899.2023.2231061","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42709365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-07DOI: 10.1080/26941899.2023.2216814
Carolina Euán, M. Fiecas, H. Ombao, D. Matteson
{"title":"Data Science in Science: Special Issue on Data Science in the Brain Sciences","authors":"Carolina Euán, M. Fiecas, H. Ombao, D. Matteson","doi":"10.1080/26941899.2023.2216814","DOIUrl":"https://doi.org/10.1080/26941899.2023.2216814","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47863755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-10DOI: 10.1080/26941899.2023.2200856
M. Klanderman, Junho Lee, K. Villez, T. Cath, A. Hering
{"title":"Adaptive Online Multivariate Signal Extraction With Locally Weighted Robust Polynomial Regression","authors":"M. Klanderman, Junho Lee, K. Villez, T. Cath, A. Hering","doi":"10.1080/26941899.2023.2200856","DOIUrl":"https://doi.org/10.1080/26941899.2023.2200856","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48838360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-06DOI: 10.1080/26941899.2023.2176379
Yu. P. Shapovalova, M. Eichler
{"title":"Measuring and Quantifying Uncertainty in Volatility Spillovers: A Bayesian Approach","authors":"Yu. P. Shapovalova, M. Eichler","doi":"10.1080/26941899.2023.2176379","DOIUrl":"https://doi.org/10.1080/26941899.2023.2176379","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43964109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}