Pub Date : 2025-01-01Epub Date: 2025-07-03DOI: 10.1080/26941899.2025.2523871
Jonathan R Holt, Stefanee Tillman, Javan Carter, Edward Preble, Sheryl C Cates, Daniel Brannock, Michael Long, John McCarthy, Leslie Zapata Leiva, Jamboor K Vishwanatha, Toufeeq Syed, Legand Burge, Robert T Mallet, Shelly Kowalczyk, Jennifer D Uhrig, Megan A Lewis
Machine learning is revolutionizing health research by enabling scalable analysis across complex datasets. The All of Us Research Program offers unprecedented access to a wealth of health data. To harness this potential, researchers must navigate the All of Us database structure, develop machine learning skills, and apply coding effectively. This paper presents case studies designed to impart these skills using the All of Us Researcher Workbench. Our case studies cover critical topics, such as dataset selection, data cleaning, machine learning applications, and visualization in Python, which together provide the foundation of a targeted training program. Evaluated through pre- and post-program surveys, the program significantly boosted participants' machine learning competencies. By detailing our approach and findings, we aim to guide researchers in harnessing the full potential of the All of Us dataset, thereby advancing precision medicine.
机器学习通过实现跨复杂数据集的可扩展分析,正在彻底改变健康研究。“我们所有人”研究计划提供了前所未有的获取大量健康数据的途径。为了利用这一潜力,研究人员必须浏览All of Us数据库结构,开发机器学习技能,并有效地应用编码。本文介绍了一些案例研究,旨在通过使用“我们所有的研究者工作台”来传授这些技能。我们的案例研究涵盖了关键主题,例如数据集选择,数据清理,机器学习应用程序和Python可视化,这些都为有针对性的培训计划提供了基础。通过项目前和项目后的调查评估,该项目显著提高了参与者的机器学习能力。通过详细介绍我们的方法和发现,我们的目标是指导研究人员充分利用我们所有人数据集的潜力,从而推进精准医疗。
{"title":"Enhancing Health Research with Machine Learning: Practical Case Studies Using the <i>All of Us</i> Researcher Workbench.","authors":"Jonathan R Holt, Stefanee Tillman, Javan Carter, Edward Preble, Sheryl C Cates, Daniel Brannock, Michael Long, John McCarthy, Leslie Zapata Leiva, Jamboor K Vishwanatha, Toufeeq Syed, Legand Burge, Robert T Mallet, Shelly Kowalczyk, Jennifer D Uhrig, Megan A Lewis","doi":"10.1080/26941899.2025.2523871","DOIUrl":"https://doi.org/10.1080/26941899.2025.2523871","url":null,"abstract":"<p><p>Machine learning is revolutionizing health research by enabling scalable analysis across complex datasets. The <i>All of Us</i> Research Program offers unprecedented access to a wealth of health data. To harness this potential, researchers must navigate the <i>All of Us</i> database structure, develop machine learning skills, and apply coding effectively. This paper presents case studies designed to impart these skills using the <i>All of Us</i> Researcher Workbench. Our case studies cover critical topics, such as dataset selection, data cleaning, machine learning applications, and visualization in Python, which together provide the foundation of a targeted training program. Evaluated through pre- and post-program surveys, the program significantly boosted participants' machine learning competencies. By detailing our approach and findings, we aim to guide researchers in harnessing the full potential of the <i>All of Us</i> dataset, thereby advancing precision medicine.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1080/26941899.2024.2376535
Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak
{"title":"Biclustering Multivariate Longitudinal Data with Application to Recovery Trajectories of White Matter After Sport-Related Concussion","authors":"Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak","doi":"10.1080/26941899.2024.2376535","DOIUrl":"https://doi.org/10.1080/26941899.2024.2376535","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141832378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-06-16DOI: 10.1080/26941899.2024.2360892
Ruiyang Li, Xi Zhu, Seonjoo Lee
In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators and exposure-by-mediator ( -by- ) interactions. Although several high-dimensional mediation methods can naturally handle -by- interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select and -by- interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.
在中介分析中,暴露往往会影响中介效应,即暴露与中介对因变量之间存在相互作用。当中介因子是高维的时候,有必要识别非零中介因子 M 和暴露-中介因子(X-by- M)的交互作用。虽然有几种高维中介方法可以自然地处理 X -by- M 交互作用,但在保留主效应和交互作用之间的潜在层次结构方面的研究却很少。为了填补这一知识空白,我们开发了 XMInt 程序,用于在高维中介设置中选择 M 和 X -by- M 交互作用,同时保留层次结构。我们提出的方法采用了一种基于序列正则化的前向选择方法来识别介质及其与暴露的分层交互作用。我们的数值实验显示了良好的选择结果。此外,我们还将我们的方法应用于 ADNI 形态学数据,研究了皮层厚度和皮层下体积对淀粉样蛋白-β累积对认知能力影响的作用,这有助于理解大脑补偿机制。
{"title":"Model Selection for Exposure-Mediator Interaction.","authors":"Ruiyang Li, Xi Zhu, Seonjoo Lee","doi":"10.1080/26941899.2024.2360892","DOIUrl":"10.1080/26941899.2024.2360892","url":null,"abstract":"<p><p>In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators <math> <mrow><mfenced><mi>M</mi></mfenced> </mrow> </math> and exposure-by-mediator ( <math><mi>X</mi></math> -by- <math><mi>M</mi></math> ) interactions. Although several high-dimensional mediation methods can naturally handle <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select <math><mi>M</mi></math> and <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11210705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-07DOI: 10.1080/26941899.2024.2415690
Yi Tang Chen, Sebastian Kurtek
We use a geometric approach to jointly characterize tumor shape and intensity along the tumor contour, as captured in magnetic resonance images, in the context of glioblastoma multiforme. Key properties of the proposed shape+intensity representation include invariance to translation, scale, rotation and reparameterization, which enable objective characterization and comparison of these crucial tumor features. The representation further allows the user to tune the emphasis of the shape and intensity components during registration, comparison and statistical summarization (averaging, computation of overall variance and exploration of variability via principal component analysis). In addition, we define a composite distance that is able to integrate shape and intensity information from two imaging modalities. The proposed framework can be integrated with distance-based clustering for the purpose of discovering groups of subjects with distinct survival prognosis. When applied to a cohort of subjects with glioblastoma multiforme, we discover groups with large median survival differences. We further tie the subjects' cluster memberships to tumor heterogeneity. Our results suggest that tumor shape variation plays an important role in disease prognosis.
{"title":"Assessment of Glioblastoma Multiforme Tumor Heterogeneity via MRI-derived Shape and Intensity Features.","authors":"Yi Tang Chen, Sebastian Kurtek","doi":"10.1080/26941899.2024.2415690","DOIUrl":"10.1080/26941899.2024.2415690","url":null,"abstract":"<p><p>We use a geometric approach to jointly characterize tumor shape and intensity along the tumor contour, as captured in magnetic resonance images, in the context of glioblastoma multiforme. Key properties of the proposed shape+intensity representation include invariance to translation, scale, rotation and reparameterization, which enable objective characterization and comparison of these crucial tumor features. The representation further allows the user to tune the emphasis of the shape and intensity components during registration, comparison and statistical summarization (averaging, computation of overall variance and exploration of variability via principal component analysis). In addition, we define a composite distance that is able to integrate shape and intensity information from two imaging modalities. The proposed framework can be integrated with distance-based clustering for the purpose of discovering groups of subjects with distinct survival prognosis. When applied to a cohort of subjects with glioblastoma multiforme, we discover groups with large median survival differences. We further tie the subjects' cluster memberships to tumor heterogeneity. Our results suggest that tumor shape variation plays an important role in disease prognosis.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12124832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144200928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-05DOI: 10.1080/26941899.2024.2407770
Tong Shen, Mingyu DU, Kevin Johnston, Gyorgy Lur, Xiangmin Xu, Hernando Ombao, Michele Guindani, Zhaoxia Yu
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate functions across trials due to learning.
{"title":"TIME-VARYING <i>ℓ</i> <sub>0</sub> OPTIMIZATION FOR SPIKE INFERENCE FROM MULTI-TRIAL CALCIUM RECORDINGS.","authors":"Tong Shen, Mingyu DU, Kevin Johnston, Gyorgy Lur, Xiangmin Xu, Hernando Ombao, Michele Guindani, Zhaoxia Yu","doi":"10.1080/26941899.2024.2407770","DOIUrl":"10.1080/26941899.2024.2407770","url":null,"abstract":"<p><p>Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying <math> <mrow><msub><mo>ℓ</mo> <mn>0</mn></msub> </mrow> </math> penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate functions across trials due to learning.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12316062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144777050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-03-06DOI: 10.1080/26941899.2024.2309403
Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk
There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.
除尸检外,阿尔茨海默病(AD)的诊断没有金标准,这就促使人们使用无监督学习法。混合回归是一种无监督方法,可以同时从多个生物标记物中识别群组,同时学习群组内的人口统计学效应。注意力缺失症的脑脊液(CSF)生物标记物具有检测极限,这带来了额外的挑战。我们对来自埃默里戈伊苏埃塔阿尔茨海默病研究中心(Emory Goizueta Alzheimer's Disease Research Center)和埃默里健康脑研究(Emory Healthy Brain Study)的 3000 多名参与者应用了多变量截断高斯分布混合回归法(也称为删减多变量高斯混合回归法或多变量托比特混合回归法),检测脑脊液中已知检测限的淀粉样β肽 1-42 (Abeta42)、总 tau 蛋白和磷酸化 tau 蛋白。我们填补了关于截断多元高斯分布混合回归的文献中的三个空白:软件可用性、推论和聚类准确性。我们发现了三个趋向于与注意力缺失症群体、正常对照组和非注意力缺失症病理特征相一致的聚类。CSF特征因种族、性别和遗传标记ApoE4的不同而不同,这突出了在有检测限的无监督学习中考虑人口因素的重要性。值得注意的是,类似 AD 组的非裔美国人的 tau 负担明显较低。
{"title":"Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits.","authors":"Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk","doi":"10.1080/26941899.2024.2309403","DOIUrl":"10.1080/26941899.2024.2309403","url":null,"abstract":"<p><p>There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-08-02DOI: 10.1080/26941899.2024.2383770
James J Yang, Anne Buu
The Singular Spectrum Analysis (SSA) is a useful tool for extracting signals from noisy time series. However, the structural insights provided by SSA are significantly influenced by the choice of window length. While the conventional approach, recommending a larger window length, excels with short to moderately-sized time series, it becomes computationally burdensome for longer time series, potentially amplifying mean squared reconstruction errors. This study addresses this methodological gap by introducing an adaptive sequential SSA method that iteratively selects an optimal window length for efficient extraction of essential eigen-sequences (signals) with minimal reconstruction error. This proposed method is versatile, catering to both short-moderate and lengthy time series. Simulation studies demonstrate its efficacy in scenarios where observed data stem from the sum of two sinusoidal functions and noise. Real data analysis on 6-day heart rate data from a young adult e-cigarette user reveals a distinct clustering of vaping events in the scatter plot of the first and third eigen-sequences, indicating the potential of developing "digital biomarkers" for vaping behavior based on extracted eigen-sequences in future studies. In conclusion, the adaptive sequential SSA method offers a robust and flexible approach for signal extraction in diverse time series applications.
{"title":"Adaptive Sequential Singular Spectrum Analysis: Effective Signal Extraction with Application to Heart Rate Signals Related to E-cigarette Use.","authors":"James J Yang, Anne Buu","doi":"10.1080/26941899.2024.2383770","DOIUrl":"https://doi.org/10.1080/26941899.2024.2383770","url":null,"abstract":"<p><p>The Singular Spectrum Analysis (SSA) is a useful tool for extracting signals from noisy time series. However, the structural insights provided by SSA are significantly influenced by the choice of window length. While the conventional approach, recommending a larger window length, excels with short to moderately-sized time series, it becomes computationally burdensome for longer time series, potentially amplifying mean squared reconstruction errors. This study addresses this methodological gap by introducing an adaptive sequential SSA method that iteratively selects an optimal window length for efficient extraction of essential eigen-sequences (signals) with minimal reconstruction error. This proposed method is versatile, catering to both short-moderate and lengthy time series. Simulation studies demonstrate its efficacy in scenarios where observed data stem from the sum of two sinusoidal functions and noise. Real data analysis on 6-day heart rate data from a young adult e-cigarette user reveals a distinct clustering of vaping events in the scatter plot of the first and third eigen-sequences, indicating the potential of developing \"digital biomarkers\" for vaping behavior based on extracted eigen-sequences in future studies. In conclusion, the adaptive sequential SSA method offers a robust and flexible approach for signal extraction in diverse time series applications.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064174/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144029748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.
{"title":"Rewiring Dynamics of Functional Connectomes during Motor-Skill Learning","authors":"Saber Meamardoost, Mahasweta Bhattacharya, Eun Jung Hwang, Chi Ren, Linbing Wang, Claudia Mewes, Ying Zhang, Takaki Komiyama, Rudiyanto Gunawan","doi":"10.1080/26941899.2023.2260431","DOIUrl":"https://doi.org/10.1080/26941899.2023.2260431","url":null,"abstract":"The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134975618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-21DOI: 10.1080/26941899.2023.2219707
Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
{"title":"An Empirical Bayes Approach to Estimating Dynamic Models of Co-Regulated Gene Expression","authors":"Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells","doi":"10.1080/26941899.2023.2219707","DOIUrl":"https://doi.org/10.1080/26941899.2023.2219707","url":null,"abstract":"Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135820909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-19DOI: 10.1080/26941899.2023.2231061
Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham
{"title":"Evaluation of Seasonality in Sea Surface Salinity Balance Equation via Function Registration","authors":"Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham","doi":"10.1080/26941899.2023.2231061","DOIUrl":"https://doi.org/10.1080/26941899.2023.2231061","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42709365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}