首页 > 最新文献

Data science in science最新文献

英文 中文
Enhancing Health Research with Machine Learning: Practical Case Studies Using the All of Us Researcher Workbench. 用机器学习加强健康研究:使用我们所有研究人员工作台的实际案例研究。
Pub Date : 2025-01-01 Epub Date: 2025-07-03 DOI: 10.1080/26941899.2025.2523871
Jonathan R Holt, Stefanee Tillman, Javan Carter, Edward Preble, Sheryl C Cates, Daniel Brannock, Michael Long, John McCarthy, Leslie Zapata Leiva, Jamboor K Vishwanatha, Toufeeq Syed, Legand Burge, Robert T Mallet, Shelly Kowalczyk, Jennifer D Uhrig, Megan A Lewis

Machine learning is revolutionizing health research by enabling scalable analysis across complex datasets. The All of Us Research Program offers unprecedented access to a wealth of health data. To harness this potential, researchers must navigate the All of Us database structure, develop machine learning skills, and apply coding effectively. This paper presents case studies designed to impart these skills using the All of Us Researcher Workbench. Our case studies cover critical topics, such as dataset selection, data cleaning, machine learning applications, and visualization in Python, which together provide the foundation of a targeted training program. Evaluated through pre- and post-program surveys, the program significantly boosted participants' machine learning competencies. By detailing our approach and findings, we aim to guide researchers in harnessing the full potential of the All of Us dataset, thereby advancing precision medicine.

机器学习通过实现跨复杂数据集的可扩展分析,正在彻底改变健康研究。“我们所有人”研究计划提供了前所未有的获取大量健康数据的途径。为了利用这一潜力,研究人员必须浏览All of Us数据库结构,开发机器学习技能,并有效地应用编码。本文介绍了一些案例研究,旨在通过使用“我们所有的研究者工作台”来传授这些技能。我们的案例研究涵盖了关键主题,例如数据集选择,数据清理,机器学习应用程序和Python可视化,这些都为有针对性的培训计划提供了基础。通过项目前和项目后的调查评估,该项目显著提高了参与者的机器学习能力。通过详细介绍我们的方法和发现,我们的目标是指导研究人员充分利用我们所有人数据集的潜力,从而推进精准医疗。
{"title":"Enhancing Health Research with Machine Learning: Practical Case Studies Using the <i>All of Us</i> Researcher Workbench.","authors":"Jonathan R Holt, Stefanee Tillman, Javan Carter, Edward Preble, Sheryl C Cates, Daniel Brannock, Michael Long, John McCarthy, Leslie Zapata Leiva, Jamboor K Vishwanatha, Toufeeq Syed, Legand Burge, Robert T Mallet, Shelly Kowalczyk, Jennifer D Uhrig, Megan A Lewis","doi":"10.1080/26941899.2025.2523871","DOIUrl":"https://doi.org/10.1080/26941899.2025.2523871","url":null,"abstract":"<p><p>Machine learning is revolutionizing health research by enabling scalable analysis across complex datasets. The <i>All of Us</i> Research Program offers unprecedented access to a wealth of health data. To harness this potential, researchers must navigate the <i>All of Us</i> database structure, develop machine learning skills, and apply coding effectively. This paper presents case studies designed to impart these skills using the <i>All of Us</i> Researcher Workbench. Our case studies cover critical topics, such as dataset selection, data cleaning, machine learning applications, and visualization in Python, which together provide the foundation of a targeted training program. Evaluated through pre- and post-program surveys, the program significantly boosted participants' machine learning competencies. By detailing our approach and findings, we aim to guide researchers in harnessing the full potential of the <i>All of Us</i> dataset, thereby advancing precision medicine.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biclustering Multivariate Longitudinal Data with Application to Recovery Trajectories of White Matter After Sport-Related Concussion 双聚类多变量纵向数据在运动相关脑震荡后白质恢复轨迹中的应用
Pub Date : 2024-07-16 DOI: 10.1080/26941899.2024.2376535
Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak
{"title":"Biclustering Multivariate Longitudinal Data with Application to Recovery Trajectories of White Matter After Sport-Related Concussion","authors":"Caleb Weaver, Luo Xiao, Qiuting Wen, Yu-Chien Wu, Jaroslaw Harezlak","doi":"10.1080/26941899.2024.2376535","DOIUrl":"https://doi.org/10.1080/26941899.2024.2376535","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141832378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Selection for Exposure-Mediator Interaction. 暴露-中介相互作用的模型选择。
Pub Date : 2024-01-01 Epub Date: 2024-06-16 DOI: 10.1080/26941899.2024.2360892
Ruiyang Li, Xi Zhu, Seonjoo Lee

In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators M and exposure-by-mediator ( X -by- M ) interactions. Although several high-dimensional mediation methods can naturally handle X -by- M interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select M and X -by- M interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.

在中介分析中,暴露往往会影响中介效应,即暴露与中介对因变量之间存在相互作用。当中介因子是高维的时候,有必要识别非零中介因子 M 和暴露-中介因子(X-by- M)的交互作用。虽然有几种高维中介方法可以自然地处理 X -by- M 交互作用,但在保留主效应和交互作用之间的潜在层次结构方面的研究却很少。为了填补这一知识空白,我们开发了 XMInt 程序,用于在高维中介设置中选择 M 和 X -by- M 交互作用,同时保留层次结构。我们提出的方法采用了一种基于序列正则化的前向选择方法来识别介质及其与暴露的分层交互作用。我们的数值实验显示了良好的选择结果。此外,我们还将我们的方法应用于 ADNI 形态学数据,研究了皮层厚度和皮层下体积对淀粉样蛋白-β累积对认知能力影响的作用,这有助于理解大脑补偿机制。
{"title":"Model Selection for Exposure-Mediator Interaction.","authors":"Ruiyang Li, Xi Zhu, Seonjoo Lee","doi":"10.1080/26941899.2024.2360892","DOIUrl":"10.1080/26941899.2024.2360892","url":null,"abstract":"<p><p>In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators <math> <mrow><mfenced><mi>M</mi></mfenced> </mrow> </math> and exposure-by-mediator ( <math><mi>X</mi></math> -by- <math><mi>M</mi></math> ) interactions. Although several high-dimensional mediation methods can naturally handle <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select <math><mi>M</mi></math> and <math><mi>X</mi></math> -by- <math><mi>M</mi></math> interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11210705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment of Glioblastoma Multiforme Tumor Heterogeneity via MRI-derived Shape and Intensity Features. 通过mri衍生的形状和强度特征评估胶质母细胞瘤多形性肿瘤的异质性。
Pub Date : 2024-01-01 Epub Date: 2024-11-07 DOI: 10.1080/26941899.2024.2415690
Yi Tang Chen, Sebastian Kurtek

We use a geometric approach to jointly characterize tumor shape and intensity along the tumor contour, as captured in magnetic resonance images, in the context of glioblastoma multiforme. Key properties of the proposed shape+intensity representation include invariance to translation, scale, rotation and reparameterization, which enable objective characterization and comparison of these crucial tumor features. The representation further allows the user to tune the emphasis of the shape and intensity components during registration, comparison and statistical summarization (averaging, computation of overall variance and exploration of variability via principal component analysis). In addition, we define a composite distance that is able to integrate shape and intensity information from two imaging modalities. The proposed framework can be integrated with distance-based clustering for the purpose of discovering groups of subjects with distinct survival prognosis. When applied to a cohort of subjects with glioblastoma multiforme, we discover groups with large median survival differences. We further tie the subjects' cluster memberships to tumor heterogeneity. Our results suggest that tumor shape variation plays an important role in disease prognosis.

在多形性胶质母细胞瘤的背景下,我们使用几何方法沿肿瘤轮廓共同表征肿瘤形状和强度,如磁共振图像中捕获的那样。所提出的形状+强度表示的关键特性包括平移、尺度、旋转和再参数化的不变性,这使得客观表征和比较这些关键的肿瘤特征成为可能。该表示进一步允许用户在注册、比较和统计汇总(平均、计算总体方差和通过主成分分析探索可变性)期间调整形状和强度分量的重点。此外,我们定义了一个复合距离,它能够整合来自两种成像模式的形状和强度信息。所提出的框架可以与基于距离的聚类相结合,以发现具有不同生存预后的受试者群体。当应用于多形性胶质母细胞瘤的队列研究时,我们发现各组的中位生存差异很大。我们进一步将受试者的集群隶属关系与肿瘤异质性联系起来。我们的研究结果表明,肿瘤形状的变化在疾病预后中起着重要作用。
{"title":"Assessment of Glioblastoma Multiforme Tumor Heterogeneity via MRI-derived Shape and Intensity Features.","authors":"Yi Tang Chen, Sebastian Kurtek","doi":"10.1080/26941899.2024.2415690","DOIUrl":"10.1080/26941899.2024.2415690","url":null,"abstract":"<p><p>We use a geometric approach to jointly characterize tumor shape and intensity along the tumor contour, as captured in magnetic resonance images, in the context of glioblastoma multiforme. Key properties of the proposed shape+intensity representation include invariance to translation, scale, rotation and reparameterization, which enable objective characterization and comparison of these crucial tumor features. The representation further allows the user to tune the emphasis of the shape and intensity components during registration, comparison and statistical summarization (averaging, computation of overall variance and exploration of variability via principal component analysis). In addition, we define a composite distance that is able to integrate shape and intensity information from two imaging modalities. The proposed framework can be integrated with distance-based clustering for the purpose of discovering groups of subjects with distinct survival prognosis. When applied to a cohort of subjects with glioblastoma multiforme, we discover groups with large median survival differences. We further tie the subjects' cluster memberships to tumor heterogeneity. Our results suggest that tumor shape variation plays an important role in disease prognosis.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12124832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144200928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TIME-VARYING 0 OPTIMIZATION FOR SPIKE INFERENCE FROM MULTI-TRIAL CALCIUM RECORDINGS. 多试验钙记录峰推断的时变l0优化。
Pub Date : 2024-01-01 Epub Date: 2024-11-05 DOI: 10.1080/26941899.2024.2407770
Tong Shen, Mingyu DU, Kevin Johnston, Gyorgy Lur, Xiangmin Xu, Hernando Ombao, Michele Guindani, Zhaoxia Yu

Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying 0 penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate functions across trials due to learning.

基因编码钙指标的光学成像是一种强大的工具,可以记录大量神经元同时在很长一段时间内从自由行为的动物。然而,从钙荧光数据中确定神经元峰值的确切时间和估计潜在的放电速率仍然具有挑战性,特别是从纵向研究中获得的钙成像数据。我们提出了一种多试验时变的l0惩罚方法,通过鲁棒整合不断发展的神经动力学来联合检测峰值和估计发射速率。仿真研究表明,该方法在脉冲检测和射速估计方面都有较好的效果。我们从两项研究中证明了我们的方法对钙荧光微量数据的有效性,第一项研究显示了两种行为之间的不同放电率函数,第二项研究显示了由于学习而在不同试验中不断发展的放电率函数。
{"title":"TIME-VARYING <i>ℓ</i> <sub>0</sub> OPTIMIZATION FOR SPIKE INFERENCE FROM MULTI-TRIAL CALCIUM RECORDINGS.","authors":"Tong Shen, Mingyu DU, Kevin Johnston, Gyorgy Lur, Xiangmin Xu, Hernando Ombao, Michele Guindani, Zhaoxia Yu","doi":"10.1080/26941899.2024.2407770","DOIUrl":"10.1080/26941899.2024.2407770","url":null,"abstract":"<p><p>Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying <math> <mrow><msub><mo>ℓ</mo> <mn>0</mn></msub> </mrow> </math> penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate functions across trials due to learning.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12316062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144777050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits. 利用多变量反应的混合回归发现具有检测限的阿尔茨海默氏症生物标志物亚型。
Pub Date : 2024-01-01 Epub Date: 2024-03-06 DOI: 10.1080/26941899.2024.2309403
Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk

There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.

除尸检外,阿尔茨海默病(AD)的诊断没有金标准,这就促使人们使用无监督学习法。混合回归是一种无监督方法,可以同时从多个生物标记物中识别群组,同时学习群组内的人口统计学效应。注意力缺失症的脑脊液(CSF)生物标记物具有检测极限,这带来了额外的挑战。我们对来自埃默里戈伊苏埃塔阿尔茨海默病研究中心(Emory Goizueta Alzheimer's Disease Research Center)和埃默里健康脑研究(Emory Healthy Brain Study)的 3000 多名参与者应用了多变量截断高斯分布混合回归法(也称为删减多变量高斯混合回归法或多变量托比特混合回归法),检测脑脊液中已知检测限的淀粉样β肽 1-42 (Abeta42)、总 tau 蛋白和磷酸化 tau 蛋白。我们填补了关于截断多元高斯分布混合回归的文献中的三个空白:软件可用性、推论和聚类准确性。我们发现了三个趋向于与注意力缺失症群体、正常对照组和非注意力缺失症病理特征相一致的聚类。CSF特征因种族、性别和遗传标记ApoE4的不同而不同,这突出了在有检测限的无监督学习中考虑人口因素的重要性。值得注意的是,类似 AD 组的非裔美国人的 tau 负担明显较低。
{"title":"Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits.","authors":"Ganzhong Tian, John Hanfelt, James Lah, Benjamin B Risk","doi":"10.1080/26941899.2024.2309403","DOIUrl":"10.1080/26941899.2024.2309403","url":null,"abstract":"<p><p>There is no gold standard for the diagnosis of Alzheimer's disease (AD), except from autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate tobit regressions) to over 3,000 participants from the Emory Goizueta Alzheimer's Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1-42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile and non-AD pathology. The CSF profiles differed by race, gender and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Sequential Singular Spectrum Analysis: Effective Signal Extraction with Application to Heart Rate Signals Related to E-cigarette Use. 自适应序列奇异谱分析:有效信号提取与电子烟使用相关的心率信号的应用。
Pub Date : 2024-01-01 Epub Date: 2024-08-02 DOI: 10.1080/26941899.2024.2383770
James J Yang, Anne Buu

The Singular Spectrum Analysis (SSA) is a useful tool for extracting signals from noisy time series. However, the structural insights provided by SSA are significantly influenced by the choice of window length. While the conventional approach, recommending a larger window length, excels with short to moderately-sized time series, it becomes computationally burdensome for longer time series, potentially amplifying mean squared reconstruction errors. This study addresses this methodological gap by introducing an adaptive sequential SSA method that iteratively selects an optimal window length for efficient extraction of essential eigen-sequences (signals) with minimal reconstruction error. This proposed method is versatile, catering to both short-moderate and lengthy time series. Simulation studies demonstrate its efficacy in scenarios where observed data stem from the sum of two sinusoidal functions and noise. Real data analysis on 6-day heart rate data from a young adult e-cigarette user reveals a distinct clustering of vaping events in the scatter plot of the first and third eigen-sequences, indicating the potential of developing "digital biomarkers" for vaping behavior based on extracted eigen-sequences in future studies. In conclusion, the adaptive sequential SSA method offers a robust and flexible approach for signal extraction in diverse time series applications.

奇异谱分析(SSA)是从含噪时间序列中提取信号的有效工具。然而,窗长的选择对SSA提供的结构洞见有显著影响。传统方法推荐较大的窗口长度,适用于较短或中等大小的时间序列,但对于较长的时间序列,它的计算负担很大,可能会放大均方重构误差。本研究通过引入自适应序列SSA方法解决了这一方法上的差距,该方法迭代选择最佳窗口长度,以最小的重构误差有效提取基本特征序列(信号)。该方法具有通用性,适合于短、中、长的时间序列。仿真研究表明,在观测数据来自两个正弦函数和噪声之和的情况下,该方法是有效的。对一名年轻成年电子烟使用者6天的心率数据进行的真实数据分析显示,在第一和第三特征序列的散点图中,电子烟事件明显聚类,这表明在未来的研究中,基于提取的特征序列开发电子烟行为的“数字生物标志物”的潜力。总之,自适应序列SSA方法为各种时间序列应用中的信号提取提供了一种鲁棒性和灵活性的方法。
{"title":"Adaptive Sequential Singular Spectrum Analysis: Effective Signal Extraction with Application to Heart Rate Signals Related to E-cigarette Use.","authors":"James J Yang, Anne Buu","doi":"10.1080/26941899.2024.2383770","DOIUrl":"https://doi.org/10.1080/26941899.2024.2383770","url":null,"abstract":"<p><p>The Singular Spectrum Analysis (SSA) is a useful tool for extracting signals from noisy time series. However, the structural insights provided by SSA are significantly influenced by the choice of window length. While the conventional approach, recommending a larger window length, excels with short to moderately-sized time series, it becomes computationally burdensome for longer time series, potentially amplifying mean squared reconstruction errors. This study addresses this methodological gap by introducing an adaptive sequential SSA method that iteratively selects an optimal window length for efficient extraction of essential eigen-sequences (signals) with minimal reconstruction error. This proposed method is versatile, catering to both short-moderate and lengthy time series. Simulation studies demonstrate its efficacy in scenarios where observed data stem from the sum of two sinusoidal functions and noise. Real data analysis on 6-day heart rate data from a young adult e-cigarette user reveals a distinct clustering of vaping events in the scatter plot of the first and third eigen-sequences, indicating the potential of developing \"digital biomarkers\" for vaping behavior based on extracted eigen-sequences in future studies. In conclusion, the adaptive sequential SSA method offers a robust and flexible approach for signal extraction in diverse time series applications.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064174/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144029748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rewiring Dynamics of Functional Connectomes during Motor-Skill Learning 运动技能学习过程中功能连接体的重新布线动态
Pub Date : 2023-10-05 DOI: 10.1080/26941899.2023.2260431
Saber Meamardoost, Mahasweta Bhattacharya, Eun Jung Hwang, Chi Ren, Linbing Wang, Claudia Mewes, Ying Zhang, Takaki Komiyama, Rudiyanto Gunawan
The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.
在生物体的一生中,大脑的功能连接组不断地重新连接。在这项研究中,我们试图通过分析清醒小鼠在杠杆按压任务学习过程中M1的2/3层(L2/3)和5层(L5)神经元活动的钙成像来阐明小鼠初级运动皮层(M1)这种重新连接的运作原理。我们的研究结果表明L2/3和L5功能连接体遵循类似的学习诱导重新布线轨迹。更具体地说,连接体以双相方式重新连接,其中功能连接在最初的几个学习过程中增加,然后逐渐修剪以恢复到稳态网络密度水平。我们发现L2/3连接体的网络连通性的增加,而L5连接体的网络连通性的增加,产生与运动表现改善(更短的提示-奖励时间)相关的神经元共放电活动,而运动表现在整个修剪阶段保持相对稳定。结果显示了一个涉及奖励/绩效最大化和网络密度维持的双阶段重新布线原则。最后,我们证明了L2/3的连接体重新布线是围绕一组核心的运动相关神经元聚集的,这些神经元在连接体中形成了一个高度互联的中枢,这些核心神经元的活动在学习过程中稳定地编码运动。
{"title":"Rewiring Dynamics of Functional Connectomes during Motor-Skill Learning","authors":"Saber Meamardoost, Mahasweta Bhattacharya, Eun Jung Hwang, Chi Ren, Linbing Wang, Claudia Mewes, Ying Zhang, Takaki Komiyama, Rudiyanto Gunawan","doi":"10.1080/26941899.2023.2260431","DOIUrl":"https://doi.org/10.1080/26941899.2023.2260431","url":null,"abstract":"The brain’s functional connectome continually rewires throughout an organism’s life. In this study, we sought to elucidate the operational principles of such rewiring in mouse primary motor cortex (M1) by analyzing calcium imaging of layer 2/3 (L2/3) and layer 5 (L5) neuronal activity in M1 of awake mice during a lever-press task learning. Our results show that L2/3 and L5 functional connectomes follow a similar learning-induced rewiring trajectory. More specifically, the connectomes rewire in a biphasic manner, where functional connectivity increases over the first few learning sessions, and then, it is gradually pruned to return to a homeostatic level of network density. We demonstrated that the increase of network connectivity in L2/3 connectomes, but not in L5, generates neuronal co-firing activity that correlates with improved motor performance (shorter cue-to-reward time), while motor performance remains relatively stable throughout the pruning phase. The results show a biphasic rewiring principle that involves the maximization of reward/performance and maintenance of network density. Finally, we demonstrated that the connectome rewiring in L2/3 is clustered around a core set of movement-associated neurons that form a highly interconnected hub in the connectomes, and that the activity of these core neurons stably encodes movement throughout learning.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134975618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Empirical Bayes Approach to Estimating Dynamic Models of Co-Regulated Gene Expression 用经验贝叶斯方法估计共调控基因表达的动态模型
Pub Date : 2023-08-21 DOI: 10.1080/26941899.2023.2219707
Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
时间过程基因表达数据集提供了深入了解复杂生物过程的动态,如免疫反应和器官发育。鉴定具有相似时间表达模式的基因是有意义的,因为这些基因通常具有生物学相关性。然而,由于这些数据集的高维性和基因表达时间动态的非线性,这项任务具有挑战性。我们提出了一种经验贝叶斯方法来估计基因表达的常微分方程(ODE)模型,从中我们得出了一个称为贝叶斯超前滞后R2 (LLR2)的基因之间的相似性度量。重要的是,LLR2的计算利用了记录基因之间已知相互作用的生物数据库;这些信息被自动用于定义ODE模型参数的先验分布。因此,LLR2是一种生物学信息指标,可用于识别具有共同移动或延迟表达模式的功能相关基因的集群或网络。然后,我们从Stein的无偏风险估计中得出数据驱动的收缩参数,以最佳地平衡ODE模型对数据和外部生物信息的拟合。使用真实的基因表达数据,我们证明了我们的方法允许我们恢复可解释的基因簇和稀疏网络。这些结果揭示了关于生物系统动力学的新见解。
{"title":"An Empirical Bayes Approach to Estimating Dynamic Models of Co-Regulated Gene Expression","authors":"Sara Venkatraman, Sumanta Basu, Andrew G. Clark, Sofie Delbare, Myung Hee Lee, Martin T. Wells","doi":"10.1080/26941899.2023.2219707","DOIUrl":"https://doi.org/10.1080/26941899.2023.2219707","url":null,"abstract":"Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag R2 (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model’s parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein’s unbiased risk estimate that optimally balance the ODE model’s fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135820909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Seasonality in Sea Surface Salinity Balance Equation via Function Registration 用函数配准评价海面盐度平衡方程的季节性
Pub Date : 2023-07-19 DOI: 10.1080/26941899.2023.2231061
Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham
{"title":"Evaluation of Seasonality in Sea Surface Salinity Balance Equation via Function Registration","authors":"Yoonji Kim, S. Brodnitz, O. Chkrebtii, F. Bingham","doi":"10.1080/26941899.2023.2231061","DOIUrl":"https://doi.org/10.1080/26941899.2023.2231061","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42709365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data science in science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1