The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.
加权集合(WE)方法是一种广泛使用的基于分段的采样技术,因其对动力学的严格处理而闻名。加权集合框架通常包括将配置空间映射到低维集合变量(CV)空间,然后将其划分为若干分区。WE 模拟的效果在很大程度上取决于 CV 和分选方案的选择。最近提出的状态预测信息瓶颈(SPIB)方法是一种很有前途的工具,可自动从数据中构建 CV,并通过迭代方式指导增强采样。在这项工作中,我们结合了先前的专家知识,推进了这一数据驱动的管道。我们的混合方法将 SPIB 学习的 CVs 与基于专家的 CVs 结合起来,前者用于加强已探索区域的取样,后者用于指导感兴趣区域的探索,从而协同两种方法的优势。通过对丙氨酸二肽和木犀草素系统的基准测试,我们证明了我们的混合方法能有效地指导 WE 模拟对感兴趣的状态进行采样,并减少运行间的差异。此外,我们对 SPIB 模型的整合还通过有效识别可迁移状态和途径以及提供直接的动态可视化,增强了对 WE 模拟数据的分析和解释。
{"title":"Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning based Information Bottleneck.","authors":"Dedi Wang, Pratyush Tiwary","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katelin E J Scott, Maria F Hermosillo Arrieta, Aislinn J Williams
{"title":"Deciphering <i>SCN2A</i>: A comprehensive review of rodent models of <i>Scn2a</i> dysfunction.","authors":"Katelin E J Scott, Maria F Hermosillo Arrieta, Aislinn J Williams","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nanjie Chen, Dongliang Yu, Dmitri Beglov, Mark Kon, Julio Enrique Castrillon-Candas
Recent advancements in protein docking site prediction have highlighted the limitations of traditional rigid docking algorithms, like PIPER, which often neglect critical stochastic elements such as solvent-induced fluctuations. These oversights can lead to inaccuracies in identifying viable docking sites due to the complexity of high-dimensional, stochastic energy manifolds with low regularity. To address this issue, our research introduces a novel model where the molecular shapes of ligands and receptors are represented using multi-variate Karhunen-Lo `eve (KL) expansions. This method effectively captures the stochastic nature of energy manifolds, allowing for a more accurate representation of molecular interactions.Developed as a plugin for PIPER, our scientific computing software enhances the platform, delivering robust uncertainty measures for the energy manifolds of ranked binding sites. Our results demonstrate that top-ranked binding sites, characterized by lower uncertainty in the stochastic energy manifold, align closely with actual docking sites. Conversely, sites with higher uncertainty correlate with less optimal docking positions. This distinction not only validates our approach but also sets a new standard in protein docking predictions, offering substantial implications for future molecular interaction research and drug development.
{"title":"Uncertainty quantification of receptor ligand binding sites prediction.","authors":"Nanjie Chen, Dongliang Yu, Dmitri Beglov, Mark Kon, Julio Enrique Castrillon-Candas","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent advancements in protein docking site prediction have highlighted the limitations of traditional rigid docking algorithms, like PIPER, which often neglect critical stochastic elements such as solvent-induced fluctuations. These oversights can lead to inaccuracies in identifying viable docking sites due to the complexity of high-dimensional, stochastic energy manifolds with low regularity. To address this issue, our research introduces a novel model where the molecular shapes of ligands and receptors are represented using multi-variate Karhunen-Lo `eve (KL) expansions. This method effectively captures the stochastic nature of energy manifolds, allowing for a more accurate representation of molecular interactions.Developed as a plugin for PIPER, our scientific computing software enhances the platform, delivering robust uncertainty measures for the energy manifolds of ranked binding sites. Our results demonstrate that top-ranked binding sites, characterized by lower uncertainty in the stochastic energy manifold, align closely with actual docking sites. Conversely, sites with higher uncertainty correlate with less optimal docking positions. This distinction not only validates our approach but also sets a new standard in protein docking predictions, offering substantial implications for future molecular interaction research and drug development.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10854274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas R Mangine, Devin W Laurence, Patricia M Sabin, Wensi Wu, Christian Herz, Christopher N Zelonis, Justin S Unger, Csaba Pinter, Andras Lasso, Steve A Maas, Jeffrey A Weiss, Matthew A Jolley
Purpose: Many approaches have been used to model chordae tendineae geometries in finite element simulations of atrioventricular heart valves. Unfortunately, current "functional" chordae tendineae geometries lack fidelity (e.g., branching) that would be helpful when informing clinical decisions. The objectives of this work are (i) to improve synthetic chordae tendineae geometry fidelity to consider branching and (ii) to define how the chordae tendineae geometry affects finite element simulations of valve closure.
Methods: In this work, we develop an open-source method to construct synthetic chordae tendineae geometries in the SlicerHeart Extension of 3D Slicer. The generated geometries are then used in FEBio finite element simulations of atrioventricular valve function to evaluate how variations in chordae tendineae geometry influence valve behavior. Effects are evaluated using functional and mechanical metrics.
Results: Our findings demonstrated that altering the chordae tendineae geometry of a stereotypical mitral valve led to changes in clinically relevant valve metrics (regurgitant orifice area, contact area, and billowing volume) and valve mechanics (first principal strains). Specifically, cross sectional area had the most influence over valve closure metrics, followed by chordae tendineae density, length, radius and branches. We then used this information to showcase the flexibility of our new workflow by altering the chordae tendineae geometry of two additional geometries (mitral valve with annular dilation and tricuspid valve) to improve finite element predictions.
Conclusion: This study presents a flexible, open-source method for generating synthetic chordae tendineae with realistic branching structures. Further, we establish relationships between the chordae tendineae geometry and valve functional/mechanical metrics. This research contribution helps enrich our opensource workflow and brings the finite element simulations closer to use in a patient-specific clinical setting.
在房室心瓣膜的有限元模拟中,有许多方法被用来模拟腱膜几何形状。遗憾的是,目前的 "功能性 "腱膜几何模型缺乏保真度,而这种保真度有助于为临床决策提供信息。这项工作的目标是:(i)提高合成腱膜腱索几何图形的保真度,以考虑分支;(ii)确定腱膜腱索几何图形如何影响瓣膜关闭的有限元模拟。在这项工作中,我们开发了一种开源方法,用于在 3D Slicer 的 SlicerHeart 扩展中构建合成腱膜腱索几何图形。生成的几何图形随后被用于房室瓣功能的 FEBio 有限元模拟,以评估腱索几何图形的变化如何影响瓣膜行为。我们使用功能和机械指标对影响进行了评估。我们的研究结果表明,改变定型二尖瓣腱膜的几何形状会导致临床相关的瓣膜指标和瓣膜力学发生变化。具体来说,横截面积对瓣膜关闭指标的影响最大,其次是腱索密度、长度、半径和分支。然后,我们利用这些信息展示了新工作流程的灵活性,通过改变另外两种几何形状(二尖瓣瓣环扩张和三尖瓣)的腱膜几何形状来改进有限元预测。本研究提出了一种灵活的开源方法,用于生成具有逼真分支结构的合成腱索。此外,我们还建立了腱索几何形状与瓣膜功能/机械指标之间的关系。这项研究有助于丰富我们的开源工作流程,并使有限元模拟更接近于在特定患者的临床环境中使用。
{"title":"Effect of Parametric Variation of Chordae Tendineae Structure on Simulated Atrioventricular Valve Closure.","authors":"Nicolas R Mangine, Devin W Laurence, Patricia M Sabin, Wensi Wu, Christian Herz, Christopher N Zelonis, Justin S Unger, Csaba Pinter, Andras Lasso, Steve A Maas, Jeffrey A Weiss, Matthew A Jolley","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Purpose: </strong>Many approaches have been used to model chordae tendineae geometries in finite element simulations of atrioventricular heart valves. Unfortunately, current \"functional\" chordae tendineae geometries lack fidelity (e.g., branching) that would be helpful when informing clinical decisions. The objectives of this work are (i) to improve synthetic chordae tendineae geometry fidelity to consider branching and (ii) to define how the chordae tendineae geometry affects finite element simulations of valve closure.</p><p><strong>Methods: </strong>In this work, we develop an open-source method to construct synthetic chordae tendineae geometries in the SlicerHeart Extension of 3D Slicer. The generated geometries are then used in FEBio finite element simulations of atrioventricular valve function to evaluate how variations in chordae tendineae geometry influence valve behavior. Effects are evaluated using functional and mechanical metrics.</p><p><strong>Results: </strong>Our findings demonstrated that altering the chordae tendineae geometry of a stereotypical mitral valve led to changes in clinically relevant valve metrics (regurgitant orifice area, contact area, and billowing volume) and valve mechanics (first principal strains). Specifically, cross sectional area had the most influence over valve closure metrics, followed by chordae tendineae density, length, radius and branches. We then used this information to showcase the flexibility of our new workflow by altering the chordae tendineae geometry of two additional geometries (mitral valve with annular dilation and tricuspid valve) to improve finite element predictions.</p><p><strong>Conclusion: </strong>This study presents a flexible, open-source method for generating synthetic chordae tendineae with realistic branching structures. Further, we establish relationships between the chordae tendineae geometry and valve functional/mechanical metrics. This research contribution helps enrich our opensource workflow and brings the finite element simulations closer to use in a patient-specific clinical setting.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr
While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: WelQrate Dataset Collection - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation Framework - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we evaluate model performance through various research questions using the WelQrate dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.
{"title":"WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.","authors":"Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, <i>WelQrate</i>. Specifically, our contributions are threefold: <b><i>WelQrate</i> Dataset Collection</b> - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; <b><i>WelQrate</i> Evaluation Framework</b> - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; <b>Benchmarking</b> - we evaluate model performance through various research questions using the <i>WelQrate</i> dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed <i>WelQrate</i> as the gold standard in small molecule drug discovery benchmarking. The <i>WelQrate</i> dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzheimer's disease (AD) is the most prevalent form of dementia, affecting millions worldwide with a progressive decline in cognitive abilities. The AD continuum encompasses a prodromal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD (MCIc) or remain stable (MCInc). Understanding the underlying mechanisms of AD requires complementary analyses relying on different data sources, leading to the development of multimodal deep learning models. In this study, we leveraged structural and functional Magnetic Resonance Imaging (sMRI/fMRI) to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduced Single Nucleotide Polymorphisms (SNPs) as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where a generative module employing Cycle Generative Adversarial Networks (cGAN) was adopted for imputing missing data within the latent space. Additionally, we adopted an Explainable Artificial Intelligence (XAI) method, Integrated Gradients (IG), to extract input features' relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our framework was able to reach the state-of-the-art in the classification of CN vs AD with an average test accuracy of 0.926 ± 0.02. For the MCInc vs MCIc task, we achieved an average prediction accuracy of 0.711 ± 0.01 using the pre-trained model for CN and AD. The interpretability analysis revealed that the classification performance was led by significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to endocytosis, amyloid-beta, and cholesterol, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.
{"title":"An interpretable generative multimodal neuroimaging-genomics framework for decoding alzheimer's disease.","authors":"Giorgio Dolci, Federica Cruciani, Md Abdur Rahaman, Anees Abrol, Jiayu Chen, Zening Fu, Ilaria Boscolo Galazzo, Gloria Menegaz, Vince D Calhoun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent form of dementia, affecting millions worldwide with a progressive decline in cognitive abilities. The AD continuum encompasses a prodromal stage known as Mild Cognitive Impairment (MCI), where patients may either progress to AD (MCIc) or remain stable (MCInc). Understanding the underlying mechanisms of AD requires complementary analyses relying on different data sources, leading to the development of multimodal deep learning models. In this study, we leveraged structural and functional Magnetic Resonance Imaging (sMRI/fMRI) to investigate the disease-induced grey matter and functional network connectivity changes. Moreover, considering AD's strong genetic component, we introduced Single Nucleotide Polymorphisms (SNPs) as a third channel. Given such diverse inputs, missing one or more modalities is a typical concern of multimodal methods. We hence propose a novel deep learning-based classification framework where a generative module employing Cycle Generative Adversarial Networks (cGAN) was adopted for imputing missing data within the latent space. Additionally, we adopted an Explainable Artificial Intelligence (XAI) method, Integrated Gradients (IG), to extract input features' relevance, enhancing our understanding of the learned representations. Two critical tasks were addressed: AD detection and MCI conversion prediction. Experimental results showed that our framework was able to reach the state-of-the-art in the classification of CN vs AD with an average test accuracy of 0.926 ± 0.02. For the MCInc vs MCIc task, we achieved an average prediction accuracy of 0.711 ± 0.01 using the pre-trained model for CN and AD. The interpretability analysis revealed that the classification performance was led by significant grey matter modulations in cortical and subcortical brain areas well known for their association with AD. Moreover, impairments in sensory-motor and visual resting state network connectivity along the disease continuum, as well as mutations in SNPs defining biological processes linked to endocytosis, amyloid-beta, and cholesterol, were identified as contributors to the achieved performance. Overall, our integrative deep learning approach shows promise for AD detection and MCI prediction, while shading light on important biological insights.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nancy R Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A Landman
White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. Specifically, there is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Harmonized submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. A machine learning approach that learned voxel-wise cross-acquisition relationships was the most effective at harmonizing connectomic, microstructure, and macrostructure features, but requires the same subject be scanned at each site co-registered. NeSH, a spatial and angular resampling method, was also effective and has generalizable framework not reliant co-registration. Our code is available at https://github.com/nancynewlin-masi/QuantConn/.
{"title":"MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI.","authors":"Nancy R Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A Landman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. Specifically, there is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Harmonized submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. A machine learning approach that learned voxel-wise cross-acquisition relationships was the most effective at harmonizing connectomic, microstructure, and macrostructure features, but requires the same subject be scanned at each site co-registered. NeSH, a spatial and angular resampling method, was also effective and has generalizable framework not reliant co-registration. Our code is available at https://github.com/nancynewlin-masi/QuantConn/.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Coccarelli, Ioannis Polydoros, Alex Drysdale, Osama F Harraz, Chennakesava Kadapa
Cerebral autoregulation plays a key physiological role by limiting blood flow changes in the face of pressure fluctuations. Although the involved cellular processes are mechanically driven, the quantification of haemodynamic forces in in-vivo settings remains extremely difficult and uncertain. In this work, we propose a novel computational framework for evaluating the blood flow dynamics across networks of myogenically active cerebral arteries, which can modulate their muscular tone to stabilize flow (and perfusion pressure) as well as to limit vascular intramural stress. The introduced framework is built on contractile (myogenically active) vascular wall mechanics and blood flow dynamics models, which can be numerically coupled in either a weak or strong way. We investigate the time dependency of the vascular wall response to pressure changes at both single vessel and network levels. The robustness of the model was assessed by considering different types of inlet signals and numerical settings in an idealized vascular network formed by a middle cerebral artery and its three generations. For the vessel size and boundary conditions considered, weak coupling ensured accurate results with a lower computational cost. To complete the analysis, we evaluated the effect of an upstream pressure surge on the haemodynamics of the vascular network. This provided a clear quantitative picture of how pressure and flow are redistributed across each vessel generation upon inlet pressure changes. This work paves the way for future combined experimental-computational studies aiming to decipher cerebral autoregulation.
{"title":"A new computational model for quantifying blood flow dynamics across myogenically-active cerebral arterial networks.","authors":"Alberto Coccarelli, Ioannis Polydoros, Alex Drysdale, Osama F Harraz, Chennakesava Kadapa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cerebral autoregulation plays a key physiological role by limiting blood flow changes in the face of pressure fluctuations. Although the involved cellular processes are mechanically driven, the quantification of haemodynamic forces in in-vivo settings remains extremely difficult and uncertain. In this work, we propose a novel computational framework for evaluating the blood flow dynamics across networks of myogenically active cerebral arteries, which can modulate their muscular tone to stabilize flow (and perfusion pressure) as well as to limit vascular intramural stress. The introduced framework is built on contractile (myogenically active) vascular wall mechanics and blood flow dynamics models, which can be numerically coupled in either a weak or strong way. We investigate the time dependency of the vascular wall response to pressure changes at both single vessel and network levels. The robustness of the model was assessed by considering different types of inlet signals and numerical settings in an idealized vascular network formed by a middle cerebral artery and its three generations. For the vessel size and boundary conditions considered, weak coupling ensured accurate results with a lower computational cost. To complete the analysis, we evaluated the effect of an upstream pressure surge on the haemodynamics of the vascular network. This provided a clear quantitative picture of how pressure and flow are redistributed across each vessel generation upon inlet pressure changes. This work paves the way for future combined experimental-computational studies aiming to decipher cerebral autoregulation.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.
{"title":"Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects.","authors":"Aixa X Andrade, Son Nguyen, Albert Montillo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pranav Kantroo, Günter P Wagner, Benjamin B Machta
The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.
{"title":"High fitness paths can connect proteins with low sequence overlap.","authors":"Pranav Kantroo, Günter P Wagner, Benjamin B Machta","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}