首页 > 最新文献

ArXiv最新文献

英文 中文
Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning based Information Bottleneck. 通过基于深度学习的信息瓶颈,在加权集合仿真中增强人类的专业知识。
Pub Date : 2024-11-15
Dedi Wang, Pratyush Tiwary

The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.

加权集合(WE)方法是一种广泛使用的基于分段的采样技术,因其对动力学的严格处理而闻名。加权集合框架通常包括将配置空间映射到低维集合变量(CV)空间,然后将其划分为若干分区。WE 模拟的效果在很大程度上取决于 CV 和分选方案的选择。最近提出的状态预测信息瓶颈(SPIB)方法是一种很有前途的工具,可自动从数据中构建 CV,并通过迭代方式指导增强采样。在这项工作中,我们结合了先前的专家知识,推进了这一数据驱动的管道。我们的混合方法将 SPIB 学习的 CVs 与基于专家的 CVs 结合起来,前者用于加强已探索区域的取样,后者用于指导感兴趣区域的探索,从而协同两种方法的优势。通过对丙氨酸二肽和木犀草素系统的基准测试,我们证明了我们的混合方法能有效地指导 WE 模拟对感兴趣的状态进行采样,并减少运行间的差异。此外,我们对 SPIB 模型的整合还通过有效识别可迁移状态和途径以及提供直接的动态可视化,增强了对 WE 模拟数据的分析和解释。
{"title":"Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning based Information Bottleneck.","authors":"Dedi Wang, Pratyush Tiwary","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering SCN2A: A comprehensive review of rodent models of Scn2a dysfunction. 解密 SCN2A:全面回顾 Scn2a 功能障碍的啮齿动物模型。
Pub Date : 2024-11-15
Katelin E J Scott, Maria F Hermosillo Arrieta, Aislinn J Williams
{"title":"Deciphering <i>SCN2A</i>: A comprehensive review of rodent models of <i>Scn2a</i> dysfunction.","authors":"Katelin E J Scott, Maria F Hermosillo Arrieta, Aislinn J Williams","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty quantification of receptor ligand binding sites prediction. 受体配体结合位点预测的不确定性量化。
Pub Date : 2024-11-15
Nanjie Chen, Dongliang Yu, Dmitri Beglov, Mark Kon, Julio Enrique Castrillon-Candas

Recent advancements in protein docking site prediction have highlighted the limitations of traditional rigid docking algorithms, like PIPER, which often neglect critical stochastic elements such as solvent-induced fluctuations. These oversights can lead to inaccuracies in identifying viable docking sites due to the complexity of high-dimensional, stochastic energy manifolds with low regularity. To address this issue, our research introduces a novel model where the molecular shapes of ligands and receptors are represented using multi-variate Karhunen-Lo `eve (KL) expansions. This method effectively captures the stochastic nature of energy manifolds, allowing for a more accurate representation of molecular interactions.Developed as a plugin for PIPER, our scientific computing software enhances the platform, delivering robust uncertainty measures for the energy manifolds of ranked binding sites. Our results demonstrate that top-ranked binding sites, characterized by lower uncertainty in the stochastic energy manifold, align closely with actual docking sites. Conversely, sites with higher uncertainty correlate with less optimal docking positions. This distinction not only validates our approach but also sets a new standard in protein docking predictions, offering substantial implications for future molecular interaction research and drug development.

蛋白质对接位点预测领域的最新进展凸显了传统刚性对接算法(如 PIPER)的局限性,这些算法往往忽略了关键的随机因素,如溶剂引起的波动。由于高维随机能量流形的复杂性和低规则性,这些疏忽会导致在确定可行的对接位点时出现误差。为了解决这个问题,我们的研究引入了一个新模型,在这个模型中,配体和受体的分子形状使用多变量卡尔胡宁-洛厄夫(KL)展开来表示。作为 PIPER 的插件,我们开发的科学计算软件增强了该平台,为排名靠前的结合位点的能量流形提供了稳健的不确定性测量。我们的研究结果表明,排名靠前的结合位点随机能量流形的不确定性较低,与实际对接位点非常接近。相反,不确定性较高的结合位点与较差的最佳对接位置相关。这种区别不仅验证了我们的方法,还为蛋白质对接预测设定了新标准,对未来的分子相互作用研究和药物开发具有重大意义。
{"title":"Uncertainty quantification of receptor ligand binding sites prediction.","authors":"Nanjie Chen, Dongliang Yu, Dmitri Beglov, Mark Kon, Julio Enrique Castrillon-Candas","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent advancements in protein docking site prediction have highlighted the limitations of traditional rigid docking algorithms, like PIPER, which often neglect critical stochastic elements such as solvent-induced fluctuations. These oversights can lead to inaccuracies in identifying viable docking sites due to the complexity of high-dimensional, stochastic energy manifolds with low regularity. To address this issue, our research introduces a novel model where the molecular shapes of ligands and receptors are represented using multi-variate Karhunen-Lo `eve (KL) expansions. This method effectively captures the stochastic nature of energy manifolds, allowing for a more accurate representation of molecular interactions.Developed as a plugin for PIPER, our scientific computing software enhances the platform, delivering robust uncertainty measures for the energy manifolds of ranked binding sites. Our results demonstrate that top-ranked binding sites, characterized by lower uncertainty in the stochastic energy manifold, align closely with actual docking sites. Conversely, sites with higher uncertainty correlate with less optimal docking positions. This distinction not only validates our approach but also sets a new standard in protein docking predictions, offering substantial implications for future molecular interaction research and drug development.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10854274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of Parametric Variation of Chordae Tendineae Structure on Simulated Atrioventricular Valve Closure. 腱索结构参数变化对模拟房室瓣关闭的影响
Pub Date : 2024-11-14
Nicolas R Mangine, Devin W Laurence, Patricia M Sabin, Wensi Wu, Christian Herz, Christopher N Zelonis, Justin S Unger, Csaba Pinter, Andras Lasso, Steve A Maas, Jeffrey A Weiss, Matthew A Jolley

Purpose: Many approaches have been used to model chordae tendineae geometries in finite element simulations of atrioventricular heart valves. Unfortunately, current "functional" chordae tendineae geometries lack fidelity (e.g., branching) that would be helpful when informing clinical decisions. The objectives of this work are (i) to improve synthetic chordae tendineae geometry fidelity to consider branching and (ii) to define how the chordae tendineae geometry affects finite element simulations of valve closure.

Methods: In this work, we develop an open-source method to construct synthetic chordae tendineae geometries in the SlicerHeart Extension of 3D Slicer. The generated geometries are then used in FEBio finite element simulations of atrioventricular valve function to evaluate how variations in chordae tendineae geometry influence valve behavior. Effects are evaluated using functional and mechanical metrics.

Results: Our findings demonstrated that altering the chordae tendineae geometry of a stereotypical mitral valve led to changes in clinically relevant valve metrics (regurgitant orifice area, contact area, and billowing volume) and valve mechanics (first principal strains). Specifically, cross sectional area had the most influence over valve closure metrics, followed by chordae tendineae density, length, radius and branches. We then used this information to showcase the flexibility of our new workflow by altering the chordae tendineae geometry of two additional geometries (mitral valve with annular dilation and tricuspid valve) to improve finite element predictions.

Conclusion: This study presents a flexible, open-source method for generating synthetic chordae tendineae with realistic branching structures. Further, we establish relationships between the chordae tendineae geometry and valve functional/mechanical metrics. This research contribution helps enrich our opensource workflow and brings the finite element simulations closer to use in a patient-specific clinical setting.

在房室心瓣膜的有限元模拟中,有许多方法被用来模拟腱膜几何形状。遗憾的是,目前的 "功能性 "腱膜几何模型缺乏保真度,而这种保真度有助于为临床决策提供信息。这项工作的目标是:(i)提高合成腱膜腱索几何图形的保真度,以考虑分支;(ii)确定腱膜腱索几何图形如何影响瓣膜关闭的有限元模拟。在这项工作中,我们开发了一种开源方法,用于在 3D Slicer 的 SlicerHeart 扩展中构建合成腱膜腱索几何图形。生成的几何图形随后被用于房室瓣功能的 FEBio 有限元模拟,以评估腱索几何图形的变化如何影响瓣膜行为。我们使用功能和机械指标对影响进行了评估。我们的研究结果表明,改变定型二尖瓣腱膜的几何形状会导致临床相关的瓣膜指标和瓣膜力学发生变化。具体来说,横截面积对瓣膜关闭指标的影响最大,其次是腱索密度、长度、半径和分支。然后,我们利用这些信息展示了新工作流程的灵活性,通过改变另外两种几何形状(二尖瓣瓣环扩张和三尖瓣)的腱膜几何形状来改进有限元预测。本研究提出了一种灵活的开源方法,用于生成具有逼真分支结构的合成腱索。此外,我们还建立了腱索几何形状与瓣膜功能/机械指标之间的关系。这项研究有助于丰富我们的开源工作流程,并使有限元模拟更接近于在特定患者的临床环境中使用。
{"title":"Effect of Parametric Variation of Chordae Tendineae Structure on Simulated Atrioventricular Valve Closure.","authors":"Nicolas R Mangine, Devin W Laurence, Patricia M Sabin, Wensi Wu, Christian Herz, Christopher N Zelonis, Justin S Unger, Csaba Pinter, Andras Lasso, Steve A Maas, Jeffrey A Weiss, Matthew A Jolley","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Purpose: </strong>Many approaches have been used to model chordae tendineae geometries in finite element simulations of atrioventricular heart valves. Unfortunately, current \"functional\" chordae tendineae geometries lack fidelity (e.g., branching) that would be helpful when informing clinical decisions. The objectives of this work are (i) to improve synthetic chordae tendineae geometry fidelity to consider branching and (ii) to define how the chordae tendineae geometry affects finite element simulations of valve closure.</p><p><strong>Methods: </strong>In this work, we develop an open-source method to construct synthetic chordae tendineae geometries in the SlicerHeart Extension of 3D Slicer. The generated geometries are then used in FEBio finite element simulations of atrioventricular valve function to evaluate how variations in chordae tendineae geometry influence valve behavior. Effects are evaluated using functional and mechanical metrics.</p><p><strong>Results: </strong>Our findings demonstrated that altering the chordae tendineae geometry of a stereotypical mitral valve led to changes in clinically relevant valve metrics (regurgitant orifice area, contact area, and billowing volume) and valve mechanics (first principal strains). Specifically, cross sectional area had the most influence over valve closure metrics, followed by chordae tendineae density, length, radius and branches. We then used this information to showcase the flexibility of our new workflow by altering the chordae tendineae geometry of two additional geometries (mitral valve with annular dilation and tricuspid valve) to improve finite element predictions.</p><p><strong>Conclusion: </strong>This study presents a flexible, open-source method for generating synthetic chordae tendineae with realistic branching structures. Further, we establish relationships between the chordae tendineae geometry and valve functional/mechanical metrics. This research contribution helps enrich our opensource workflow and brings the finite element simulations closer to use in a patient-specific clinical setting.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking. WelQrate:确定小分子药物发现基准的黄金标准。
Pub Date : 2024-11-14
Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: WelQrate Dataset Collection - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation Framework - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we evaluate model performance through various research questions using the WelQrate dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.

虽然深度学习为计算机辅助药物发现带来了革命性的变化,但人工智能界主要关注的是模型创新,而不太重视建立最佳基准实践。我们认为,如果没有一个完善的模型评估框架,人工智能界的努力就无法充分发挥其潜力,从而延缓创新在现实世界药物发现中的进展和转移。因此,在本文中,我们试图建立一个新的小分子药物发现基准黄金标准--WelQrate。具体来说,我们的贡献有三个方面:WelQrate 数据集收集--我们介绍了经过精心策划的 9 个数据集,涵盖 5 个治疗靶点类别。我们的分层筛选管道由药物发现专家设计,通过利用额外的确证筛选和反筛选以及严格的领域驱动预处理(如泛检测干扰化合物 (PAINS) 过滤),超越了主要的高通量筛选,以确保数据集中的高质量数据;WelQrate 评估框架 - 我们提出了一个标准化的模型评估框架,该框架考虑了高质量数据集、特征化、三维构象生成、评估指标和数据拆分,为药物发现专家进行真实世界虚拟筛选提供了可靠的基准;基准评估 - 我们利用 WelQrate 数据集通过各种研究问题评估模型性能,探索不同模型、数据集质量、特征化方法和数据拆分策略对结果的影响。总之,我们建议采用我们提出的 WelQrate 作为小分子药物发现基准测试的黄金标准。WelQrate 数据集、整理代码和实验脚本均可在 WelQrate.org 上公开获取。
{"title":"WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.","authors":"Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, <i>WelQrate</i>. Specifically, our contributions are threefold: <b><i>WelQrate</i> Dataset Collection</b> - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; <b><i>WelQrate</i> Evaluation Framework</b> - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; <b>Benchmarking</b> - we evaluate model performance through various research questions using the <i>WelQrate</i> dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed <i>WelQrate</i> as the gold standard in small molecule drug discovery benchmarking. The <i>WelQrate</i> dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable generative multimodal neuroimaging-genomics framework for decoding alzheimer's disease. 用于解码阿尔茨海默病的可解释生成式多模态神经成像基因组学框架。
Pub Date : 2024-11-14
Giorgio Dolci, Federica Cruciani, Md Abdur Rahaman, Anees Abrol, Jiayu Chen, Zening Fu, Ilaria Boscolo Galazzo, Gloria Menegaz, Vince D Calhoun

Alzheimer's disease (AD) is the most prevalent form of dementia, affecting millions worldwide with a progressive decline in cognitive abilities. The AD continuum encompasses a prodromal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD (MCIc) or remain stable (MCInc). Understanding the underlying mechanisms of AD requires complementary analyses relying on different data sources, leading to the development of multimodal deep learning models. In this study, we leveraged structural and functional Magnetic Resonance Imaging (sMRI/fMRI) to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduced Single Nucleotide Polymorphisms (SNPs) as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where a generative module employing Cycle Generative Adversarial Networks (cGAN) was adopted for imputing missing data within the latent space. Additionally, we adopted an Explainable Artificial Intelligence (XAI) method, Integrated Gradients (IG), to extract input features' relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our framework was able to reach the state-of-the-art in the classification of CN vs AD with an average test accuracy of 0.926 ± 0.02. For the MCInc vs MCIc task, we achieved an average prediction accuracy of 0.711 ± 0.01 using the pre-trained model for CN and AD. The interpretability analysis revealed that the classification performance was led by significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to endocytosis, amyloid-beta, and cholesterol, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.

阿尔茨海默病(AD)是最常见的痴呆症,患者的认知能力会逐渐下降。阿兹海默病的连续过程包括一个被称为轻度认知功能障碍(MCI)的前驱阶段,在这一阶段,患者既可能发展为阿兹海默病,也可能保持稳定。在这项研究中,我们利用结构性和功能性核磁共振成像来研究疾病引起的灰质和功能性网络连接变化。此外,考虑到注意力缺失症具有很强的遗传因素,我们还引入了 SNPs 作为第三通道。鉴于输入的多样性,遗漏一种或多种模式是多模态方法的典型问题。因此,我们提出了一种新颖的基于深度学习的分类框架,其中的生成模块采用了循环 GAN,以弥补潜在空间中的缺失数据。此外,我们还采用了一种可解释的人工智能方法--集成梯度(Integrated Gradients)来提取输入特征的相关性,从而增强我们对所学表征的理解。我们完成了两项关键任务:注意力缺失检测和 MCI 转换预测。实验结果表明,我们的模型在CN/AD分类中达到了SOA,平均测试准确率为0.926/pm0.02$。在 MCI 任务中,我们使用预先训练好的 CN/AD 模型达到了 0.711/pm0.01 美元的平均预测准确率。可解释性分析表明,皮层和皮层下脑区的灰质发生了明显的改变,而这些区域众所周知与注意力缺失症有关。此外,感觉-运动和视觉静息态网络连接沿疾病连续性的损伤,以及定义与淀粉样蛋白-β和胆固醇形成清除和调控相关的生物过程的 SNPs 突变,也被确定为影响所取得成绩的因素。总之,我们的综合深度学习方法在揭示重要的生物学见解的同时,也显示出了对注意力缺失症检测和 MCI 预测的前景。
{"title":"An interpretable generative multimodal neuroimaging-genomics framework for decoding alzheimer's disease.","authors":"Giorgio Dolci, Federica Cruciani, Md Abdur Rahaman, Anees Abrol, Jiayu Chen, Zening Fu, Ilaria Boscolo Galazzo, Gloria Menegaz, Vince D Calhoun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent form of dementia, affecting millions worldwide with a progressive decline in cognitive abilities. The AD continuum encompasses a prodromal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD (MCIc) or remain stable (MCInc). Understanding the underlying mechanisms of AD requires complementary analyses relying on different data sources, leading to the development of multimodal deep learning models. In this study, we leveraged structural and functional Magnetic Resonance Imaging (sMRI/fMRI) to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduced Single Nucleotide Polymorphisms (SNPs) as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where a generative module employing Cycle Generative Adversarial Networks (cGAN) was adopted for imputing missing data within the latent space. Additionally, we adopted an Explainable Artificial Intelligence (XAI) method, Integrated Gradients (IG), to extract input features' relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our framework was able to reach the state-of-the-art in the classification of CN vs AD with an average test accuracy of 0.926 ± 0.02. For the MCInc vs MCIc task, we achieved an average prediction accuracy of 0.711 ± 0.01 using the pre-trained model for CN and AD. The interpretability analysis revealed that the classification performance was led by significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to endocytosis, amyloid-beta, and cholesterol, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI. MICCAI-CDMRI 2023 QuantConn 挑战赛 "通过对弥散核磁共振成像进行统一预处理实现强大的定量连接性 "的研究成果。
Pub Date : 2024-11-14
Nancy R Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A Landman

White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. Specifically, there is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Harmonized submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. A machine learning approach that learned voxel-wise cross-acquisition relationships was the most effective at harmonizing connectomic, microstructure, and macrostructure features, but requires the same subject be scanned at each site co-registered. NeSH, a spatial and angular resampling method, was also effective and has generalizable framework not reliant co-registration. Our code is available at https://github.com/nancynewlin-masi/QuantConn/.

白质改变越来越多地与神经系统疾病及其进展有关。国际研究利用扩散加权磁共振成像(DW-MRI)来定性识别白质微观结构和连接性的变化。然而,DW-MRI 数据的定量分析却因不同的采集方案导致的不一致性而受到阻碍。目前迫切需要统一 DW-MRI 数据集的预处理,以确保在不同采集过程中得出可靠的定量弥散指标。在 MICCAI-CDMRI 2023 QuantConn 挑战赛中,参赛者获得了在同一台扫描仪上采集的同一人的原始数据,但采集方式不同,他们的任务是对 DW-MRI 进行预处理,以尽量减少采集差异,同时保留生物变异。参赛作品将根据跨采集束微结构测量、束形状特征和连接组学的可重复性和可比性进行评估。QuantConn 挑战赛的主要创新点在于:(1) 我们首次在协调的背景下评估束和束线图;(2) 我们首次在协调的背景下评估连接组学;(3) 我们比之前的协调挑战赛 MUSHAC 增加了 10 倍的受试者,比 SuperMUDI 增加了 100 倍。我们发现,束表面积、分数各向异性、连通组同质性、间度中心性、边缘数、模块性、结点强度和参与系数等测量指标受采集影响最大,而机器学习体素校正、RISH 映射和 NeSH 方法可有效减少这些偏差。此外,微观结构测量 AD、MD、RD、束长、连接体密度、效率和路径长度受采集差异的影响最小。
{"title":"MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI.","authors":"Nancy R Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A Landman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. Specifically, there is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Harmonized submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. A machine learning approach that learned voxel-wise cross-acquisition relationships was the most effective at harmonizing connectomic, microstructure, and macrostructure features, but requires the same subject be scanned at each site co-registered. NeSH, a spatial and angular resampling method, was also effective and has generalizable framework not reliant co-registration. Our code is available at https://github.com/nancynewlin-masi/QuantConn/.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new computational model for quantifying blood flow dynamics across myogenically-active cerebral arterial networks. 量化肌源性脑动脉网络血流动态的新计算模型。
Pub Date : 2024-11-13
Alberto Coccarelli, Ioannis Polydoros, Alex Drysdale, Osama F Harraz, Chennakesava Kadapa

Cerebral autoregulation plays a key physiological role by limiting blood flow changes in the face of pressure fluctuations. Although the involved cellular processes are mechanically driven, the quantification of haemodynamic forces in in-vivo settings remains extremely difficult and uncertain. In this work, we propose a novel computational framework for evaluating the blood flow dynamics across networks of myogenically active cerebral arteries, which can modulate their muscular tone to stabilize flow (and perfusion pressure) as well as to limit vascular intramural stress. The introduced framework is built on contractile (myogenically active) vascular wall mechanics and blood flow dynamics models, which can be numerically coupled in either a weak or strong way. We investigate the time dependency of the vascular wall response to pressure changes at both single vessel and network levels. The robustness of the model was assessed by considering different types of inlet signals and numerical settings in an idealized vascular network formed by a middle cerebral artery and its three generations. For the vessel size and boundary conditions considered, weak coupling ensured accurate results with a lower computational cost. To complete the analysis, we evaluated the effect of an upstream pressure surge on the haemodynamics of the vascular network. This provided a clear quantitative picture of how pressure and flow are redistributed across each vessel generation upon inlet pressure changes. This work paves the way for future combined experimental-computational studies aiming to decipher cerebral autoregulation.

面对压力波动,大脑自动调节通过限制血流变化发挥着关键的生理作用。虽然所涉及的细胞过程是由机械驱动的,但体内血流动力学力的量化仍然极其困难和不确定。在这项工作中,我们提出了一种新的计算框架,用于评估肌源性活跃脑动脉网络的血流动力学,这些脑动脉可以调节其肌肉张力,以稳定血流(和灌注压力)并限制血管壁内应力。引入的框架建立在收缩(肌源活性)血管壁力学和血流动力学模型之上,这些模型可以弱或强的方式进行数值耦合。我们研究了单个血管和网络层面的血管壁对压力变化响应的时间依赖性。在一个由大脑中动脉及其三代组成的理想化血管网络中,通过考虑不同类型的入口信号和数值设置,评估了模型的稳健性。对于所考虑的血管大小和边界条件,弱耦合确保了以较低的计算成本获得精确的结果。为了完成分析,我们评估了上游压力激增对血管网络血液动力学的影响。这为我们提供了一幅清晰的定量图景,展示了入口压力变化时压力和流量如何在每一代血管中重新分配。这项工作为未来旨在解读大脑自动调节的实验-计算联合研究铺平了道路。
{"title":"A new computational model for quantifying blood flow dynamics across myogenically-active cerebral arterial networks.","authors":"Alberto Coccarelli, Ioannis Polydoros, Alex Drysdale, Osama F Harraz, Chennakesava Kadapa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cerebral autoregulation plays a key physiological role by limiting blood flow changes in the face of pressure fluctuations. Although the involved cellular processes are mechanically driven, the quantification of haemodynamic forces in in-vivo settings remains extremely difficult and uncertain. In this work, we propose a novel computational framework for evaluating the blood flow dynamics across networks of myogenically active cerebral arteries, which can modulate their muscular tone to stabilize flow (and perfusion pressure) as well as to limit vascular intramural stress. The introduced framework is built on contractile (myogenically active) vascular wall mechanics and blood flow dynamics models, which can be numerically coupled in either a weak or strong way. We investigate the time dependency of the vascular wall response to pressure changes at both single vessel and network levels. The robustness of the model was assessed by considering different types of inlet signals and numerical settings in an idealized vascular network formed by a middle cerebral artery and its three generations. For the vessel size and boundary conditions considered, weak coupling ensured accurate results with a lower computational cost. To complete the analysis, we evaluated the effect of an upstream pressure surge on the haemodynamics of the vascular network. This provided a clear quantitative picture of how pressure and flow are redistributed across each vessel generation upon inlet pressure changes. This work paves the way for future combined experimental-computational studies aiming to decipher cerebral autoregulation.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects. 混合效应深度学习通过量化和可视化批次效应,对单细胞 RNA 测序数据进行可解释的分析。
Pub Date : 2024-11-13
Aixa X Andrade, Son Nguyen, Albert Montillo

Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.

单细胞 RNA 测序(scRNA-seq)数据经常受到技术或生物批次效应的干扰。现有的深度学习模型可以减轻这些影响,但往往会丢弃特定批次的信息,从而可能失去有价值的生物学见解。我们提出了一种混合效应深度学习(MEDL)自动编码器框架,它能分别对批次不变(固定效应)和批次特定(随机效应)成分进行建模。通过将批次不变的生物状态与批次变化解耦,我们的框架将两者都整合到了预测模型中。我们的方法还能生成同一细胞在不同批次中出现情况的二维可视化图像,从而提高可解释性。同时保留固定效应和随机效应潜空间可提高分类准确性。我们将框架应用于心血管系统(健康心脏)、自闭症谱系障碍(ASD)和急性髓性白血病(AML)三个数据集。健康心脏 "数据集中有 147 个批次,远远超出了通常的数量,因此我们测试了我们的框架处理多个批次的能力。在 ASD 数据集中,我们的方法捕捉到了自闭症患者和健康人之间的供体异质性。在急性髓细胞白血病数据集中,尽管细胞类型缺失,而且患病供体既有健康细胞也有恶性细胞,但我们的方法仍能区分供体的异质性。这些结果凸显了我们的框架在描述固定效应和随机效应、增强批量效应可视化以及提高不同数据集预测准确性方面的能力。
{"title":"Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects.","authors":"Aixa X Andrade, Son Nguyen, Albert Montillo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High fitness paths can connect proteins with low sequence overlap. 高匹配度路径可以连接序列重叠度较低的蛋白质。
Pub Date : 2024-11-13
Pranav Kantroo, Günter P Wagner, Benjamin B Machta

The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

蛋白质的结构和功能由其氨基酸序列决定。随机突变会改变蛋白质的序列,而进化力量则会塑造其结构折叠和生物活性。研究表明,中性网络可以通过单残基突变将序列空间的局部区域连接起来,从而保持活力。然而,人们对蛋白质形态空间更大规模的连接性仍然知之甚少。人工智能的最新进展使我们能够通过计算预测蛋白质的结构,并量化其功能合理性。在这里,我们以这些工具为基础,开发了一种算法,可以在远缘的现存蛋白质对之间生成可行的路径。这些路径中的中间序列在随后的步骤中因单个残基变化而不同--替换、插入和删除都是允许的动作。我们使用蛋白质语言模型 ESM2 对它们的适配性进行评估,并在遍历的限制条件下尽可能保持较高的适配性。我们记录了渐进式差异蛋白质对之间生成路径的质量变化,其中一些甚至没有获得相同的结构折叠。两个序列之间插值的难易程度可以代表它们之间同源性的可能性。
{"title":"High fitness paths can connect proteins with low sequence overlap.","authors":"Pranav Kantroo, Günter P Wagner, Benjamin B Machta","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1