arXiv - QuanBio - Molecular Networks最新文献_第2页

RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction RGDA-DDI：基于残差图注意力网络和双重注意力的药物相互作用预测框架

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-27 DOI: arxiv-2408.15310

Changjian Zhou, Xin Zhang, Jiafeng Li, Jia Song, Wensheng Xiang

Recent studies suggest that drug-drug interaction (DDI) prediction viacomputational approaches has significant importance for understanding thefunctions and co-prescriptions of multiple drugs. However, the existing silicoDDI prediction methods either ignore the potential interactions among drug-drugpairs (DDPs), or fail to explicitly model and fuse the multi-scale drug featurerepresentations for better prediction. In this study, we propose RGDA-DDI, aresidual graph attention network (residual-GAT) and dual-attention basedframework for drug-drug interaction prediction. A residual-GAT module isintroduced to simultaneously learn multi-scale feature representations fromdrugs and DDPs. In addition, a dual-attention based feature fusion block isconstructed to learn local joint interaction representations. A series ofevaluation metrics demonstrate that the RGDA-DDI significantly improved DDIprediction performance on two public benchmark datasets, which provides a newinsight into drug development.

最近的研究表明，利用计算方法进行药物相互作用（DDI）预测对于了解多种药物的功能和共同处方具有重要意义。然而，现有的硅学 DDI 预测方法要么忽略了药物对（DDPs）之间的潜在相互作用，要么未能明确建模和融合多尺度药物特征表征以进行更好的预测。在这项研究中，我们提出了基于残差图注意网络（residual-GAT）和双重注意的药物相互作用预测框架 RGDA-DDI。我们引入了残差-GAT 模块，以同时学习药物和 DDP 的多尺度特征表征。此外，还构建了一个基于双注意的特征融合模块，以学习局部联合相互作用表征。一系列评估指标表明，RGDA-DDI 在两个公共基准数据集上显著提高了 DDI 预测性能，为药物开发提供了新的视角。

引用次数: 0

Extrinsic Fluctuations in the p53 Cycle p53 周期的外部波动

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-22 DOI: arxiv-2408.12107

Manuel Eduardo Hernández-García, Mariana Gómez-Schiavon, Jorge Velázquez-Castro

Fluctuations are inherent to biological systems, arising from the stochasticnature of molecular interactions, and influence various aspects of systembehavior, stability, and robustness. These fluctuations can be categorized asintrinsic, stemming from the system's inherent structure and dynamics, andextrinsic, arising from external factors, such as temperature variations.Understanding the interplay between these fluctuations is crucial for obtaininga comprehensive understanding of biological phenomena. However, studying theseeffects poses significant computational challenges. In this study, we used anunderexplored methodology to analyze the effect of extrinsic fluctuations instochastic systems using ordinary differential equations instead of solving theMaster Equation with stochastic parameters. By incorporating temperaturefluctuations into reaction rates, we explored the impact of extrinsic factorson system dynamics. We constructed a master equation and calculated theequations for the dynamics of the first two moments, offering computationalefficiency compared with directly solving the chemical master equation. Weapplied this approach to analyze a biological oscillator, focusing on the p53model and its response to temperature-induced extrinsic fluctuations. Ourfindings underscore the impact of extrinsic fluctuations on the nature ofoscillations in biological systems, with alterations in oscillatory behaviordepending on the characteristics of extrinsic fluctuations. We observed anincreased oscillation amplitude and frequency of the p53 concentration cycle.This study provides valuable insights into the effects of extrinsicfluctuations on biological oscillations and highlights the importance ofconsidering them in more complex systems to prevent unwanted scenarios relatedto health issues.

波动是生物系统所固有的，产生于分子相互作用的随机性，影响着系统行为、稳定性和鲁棒性的各个方面。这些波动可分为源于系统固有结构和动力学的内在波动和源于温度变化等外部因素的外在波动。然而，研究这些影响给计算带来了巨大挑战。在这项研究中，我们采用一种尚未探索的方法，利用常微分方程来分析外在波动对随机系统的影响，而不是求解带有随机参数的主方程。通过将温度波动纳入反应速率，我们探索了外在因素对系统动力学的影响。我们构建了一个主方程，并计算了前两个时刻的动力学方程，与直接求解化学主方程相比，计算效率更高。我们应用这种方法分析了一个生物振荡器，重点是 p53 模型及其对外部温度波动的响应。我们的发现强调了外在波动对生物系统振荡性质的影响，振荡行为的改变取决于外在波动的特征。这项研究为了解外在波动对生物振荡的影响提供了宝贵的见解，并强调了在更复杂的系统中考虑外在波动以防止与健康问题相关的不必要情况发生的重要性。

{"title":"Extrinsic Fluctuations in the p53 Cycle","authors":"Manuel Eduardo Hernández-García, Mariana Gómez-Schiavon, Jorge Velázquez-Castro","doi":"arxiv-2408.12107","DOIUrl":"https://doi.org/arxiv-2408.12107","url":null,"abstract":"Fluctuations are inherent to biological systems, arising from the stochastic\u0000nature of molecular interactions, and influence various aspects of system\u0000behavior, stability, and robustness. These fluctuations can be categorized as\u0000intrinsic, stemming from the system's inherent structure and dynamics, and\u0000extrinsic, arising from external factors, such as temperature variations.\u0000Understanding the interplay between these fluctuations is crucial for obtaining\u0000a comprehensive understanding of biological phenomena. However, studying these\u0000effects poses significant computational challenges. In this study, we used an\u0000underexplored methodology to analyze the effect of extrinsic fluctuations in\u0000stochastic systems using ordinary differential equations instead of solving the\u0000Master Equation with stochastic parameters. By incorporating temperature\u0000fluctuations into reaction rates, we explored the impact of extrinsic factors\u0000on system dynamics. We constructed a master equation and calculated the\u0000equations for the dynamics of the first two moments, offering computational\u0000efficiency compared with directly solving the chemical master equation. We\u0000applied this approach to analyze a biological oscillator, focusing on the p53\u0000model and its response to temperature-induced extrinsic fluctuations. Our\u0000findings underscore the impact of extrinsic fluctuations on the nature of\u0000oscillations in biological systems, with alterations in oscillatory behavior\u0000depending on the characteristics of extrinsic fluctuations. We observed an\u0000increased oscillation amplitude and frequency of the p53 concentration cycle.\u0000This study provides valuable insights into the effects of extrinsic\u0000fluctuations on biological oscillations and highlights the importance of\u0000considering them in more complex systems to prevent unwanted scenarios related\u0000to health issues.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Active learning of digenic functions with boolean matrix logic programming 利用布尔矩阵逻辑编程主动学习数字函数

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-19 DOI: arxiv-2408.14487

Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin

We apply logic-based machine learning techniques to facilitate cellularengineering and drive biological discovery, based on comprehensive databases ofmetabolic processes called genome-scale metabolic network models (GEMs).Predicted host behaviours are not always correctly described by GEMs. Learningthe intricate genetic interactions within GEMs presents computational andempirical challenges. To address these, we describe a novel approach calledBoolean Matrix Logic Programming (BMLP) by leveraging boolean matrices toevaluate large logic programs. We introduce a new system, $BMLP_{active}$,which efficiently explores the genomic hypothesis space by guiding informativeexperimentation through active learning. In contrast to sub-symbolic methods,$BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterialhost in an interpretable and logical representation using datalog logicprograms. Notably, $BMLP_{active}$ can successfully learn the interactionbetween a gene pair with fewer training examples than random experimentation,overcoming the increase in experimental design space. $BMLP_{active}$ enablesrapid optimisation of metabolic models and offers a realistic approach to aself-driving lab for microbial engineering.

我们基于称为基因组规模代谢网络模型（GEMs）的代谢过程综合数据库，应用基于逻辑的机器学习技术来促进细胞工程和推动生物发现。学习 GEMs 中错综复杂的基因相互作用给计算和实证带来了挑战。为了解决这些问题，我们介绍了一种名为布尔矩阵逻辑编程（Boolean Matrix Logic Programming，BMLP）的新方法，利用布尔矩阵来评估大型逻辑程序。我们引入了一个新系统 $BMLP_{active}$，它通过主动学习引导信息实验，从而高效地探索基因组假设空间。与亚符号方法相比，$BMLP_{active}$使用datalog逻辑程序，以可解释的逻辑表示编码了一个被广泛接受的细菌主机的最先进的GEM。值得注意的是，与随机实验相比，$BMLP_{active}$ 可以用更少的训练实例成功地学习一对基因之间的相互作用，克服了实验设计空间增大的问题。BMLP_{active}$能够快速优化代谢模型，为微生物工程的自我驱动实验室提供了一种现实的方法。

{"title":"Active learning of digenic functions with boolean matrix logic programming","authors":"Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin","doi":"arxiv-2408.14487","DOIUrl":"https://doi.org/arxiv-2408.14487","url":null,"abstract":"We apply logic-based machine learning techniques to facilitate cellular\u0000engineering and drive biological discovery, based on comprehensive databases of\u0000metabolic processes called genome-scale metabolic network models (GEMs).\u0000Predicted host behaviours are not always correctly described by GEMs. Learning\u0000the intricate genetic interactions within GEMs presents computational and\u0000empirical challenges. To address these, we describe a novel approach called\u0000Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to\u0000evaluate large logic programs. We introduce a new system, $BMLP_{active}$,\u0000which efficiently explores the genomic hypothesis space by guiding informative\u0000experimentation through active learning. In contrast to sub-symbolic methods,\u0000$BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial\u0000host in an interpretable and logical representation using datalog logic\u0000programs. Notably, $BMLP_{active}$ can successfully learn the interaction\u0000between a gene pair with fewer training examples than random experimentation,\u0000overcoming the increase in experimental design space. $BMLP_{active}$ enables\u0000rapid optimisation of metabolic models and offers a realistic approach to a\u0000self-driving lab for microbial engineering.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NP-TCMtarget: a network pharmacology platform for exploring mechanisms of action of Traditional Chinese medicine NP-TCMtarget：探索中药作用机制的网络药理学平台

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-17 DOI: arxiv-2408.09142

Aoyi Wang, Yingdong Wang, Haoyang Peng, Haoran Zhang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Jianxin Chen, Peng Li

The biological targets of traditional Chinese medicine (TCM) are the coreeffectors mediating the interaction between TCM and the human body.Identification of TCM targets is essential to elucidate the chemical basis andmechanisms of TCM for treating diseases. Given the chemical complexity of TCM,both in silico high-throughput drug-target interaction predicting models andbiological profile-based methods have been commonly applied for identifying TCMtargets based on the structural information of TCM chemical components andbiological information, respectively. However, the existing methods lack theintegration of TCM chemical and biological information, resulting in difficultyin the systematic discovery of TCM action pathways. To solve this problem, wepropose a novel target identification model NP-TCMtarget to explore the TCMtarget path by combining the overall chemical and biological profiles. First,NP-TCMtarget infers TCM effect targets by calculating associations betweendrug/disease inducible gene expression profiles and specific gene signaturesfor 8,233 targets. Then, NP-TCMtarget utilizes a constructed binaryclassification model to predict binding targets of herbal ingredients. Finally,we can distinguish TCM direct and indirect targets by comparing the effecttargets and binding targets to establish the action pathways of herbalcomponents-direct targets-indirect targets by mapping TCM targets in thebiological molecular network. We apply NP-TCMtarget to the formula XiaoKeAn todemonstrate the power of revealing the action pathways of herbal formula. Weexpect that this novel model could provide a systematic framework for exploringthe molecular mechanisms of TCM at the target level. NP-TCMtarget is availableat http://www.bcxnfz.top/NP-TCMtarget.

中药的生物靶点是中药与人体相互作用的核心效应因子，鉴定中药靶点对于阐明中药治疗疾病的化学基础和机理至关重要。鉴于中药化学成分的复杂性，基于中药化学成分的结构信息和生物学信息的中药靶点识别方法已被普遍采用。然而，现有方法缺乏对中药化学信息和生物学信息的整合，导致难以系统地发现中药作用途径。为了解决这一问题，我们提出了一种新的目标识别模型 NP-TCMtarget，通过结合整体化学和生物学特征来探索中药目标路径。首先，NP-TCMtarget 通过计算 8,233 个靶点的药物/疾病诱导基因表达谱和特定基因特征之间的关联，推断中药作用靶点。然后，NP-TCMtarget 利用构建的二元分类模型预测中药成分的结合靶标。最后，通过比较效应靶标和结合靶标，我们可以区分中药的直接靶标和间接靶标，从而通过绘制中药靶标在生物分子网络中的图谱，建立中药成分的作用途径--直接靶标--间接靶标。我们将 NP-TCMtarget 应用于方剂小可安，展示了揭示中药方剂作用途径的能力。我们期待这个新模型能为探索中药靶标水平的分子机制提供一个系统框架。NP-TCMtarget可在http://www.bcxnfz.top/NP-TCMtarget。

{"title":"NP-TCMtarget: a network pharmacology platform for exploring mechanisms of action of Traditional Chinese medicine","authors":"Aoyi Wang, Yingdong Wang, Haoyang Peng, Haoran Zhang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Jianxin Chen, Peng Li","doi":"arxiv-2408.09142","DOIUrl":"https://doi.org/arxiv-2408.09142","url":null,"abstract":"The biological targets of traditional Chinese medicine (TCM) are the core\u0000effectors mediating the interaction between TCM and the human body.\u0000Identification of TCM targets is essential to elucidate the chemical basis and\u0000mechanisms of TCM for treating diseases. Given the chemical complexity of TCM,\u0000both in silico high-throughput drug-target interaction predicting models and\u0000biological profile-based methods have been commonly applied for identifying TCM\u0000targets based on the structural information of TCM chemical components and\u0000biological information, respectively. However, the existing methods lack the\u0000integration of TCM chemical and biological information, resulting in difficulty\u0000in the systematic discovery of TCM action pathways. To solve this problem, we\u0000propose a novel target identification model NP-TCMtarget to explore the TCM\u0000target path by combining the overall chemical and biological profiles. First,\u0000NP-TCMtarget infers TCM effect targets by calculating associations between\u0000drug/disease inducible gene expression profiles and specific gene signatures\u0000for 8,233 targets. Then, NP-TCMtarget utilizes a constructed binary\u0000classification model to predict binding targets of herbal ingredients. Finally,\u0000we can distinguish TCM direct and indirect targets by comparing the effect\u0000targets and binding targets to establish the action pathways of herbal\u0000components-direct targets-indirect targets by mapping TCM targets in the\u0000biological molecular network. We apply NP-TCMtarget to the formula XiaoKeAn to\u0000demonstrate the power of revealing the action pathways of herbal formula. We\u0000expect that this novel model could provide a systematic framework for exploring\u0000the molecular mechanisms of TCM at the target level. NP-TCMtarget is available\u0000at http://www.bcxnfz.top/NP-TCMtarget.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Quantum Algorithm for Efficient Attractor Search in Gene Regulatory Networks 用于基因调控网络中高效吸引子搜索的新型量子算法

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-16 DOI: arxiv-2408.08814

Mirko Rossini, Felix M. Weidner, Joachim Ankerhold, Hans A. Kestler

The description of gene interactions that constantly occur in the cellularenvironment is an extremely challenging task due to an immense number ofdegrees of freedom and incomplete knowledge about microscopic details. Hence, acoarse-grained and rather powerful modeling of such dynamics is provided byBoolean Networks (BNs). BNs are dynamical systems composed of Boolean agentsand a record of their possible interactions over time. Stable states in thesesystems are called attractors which are closely related to the cellularexpression of biological phenotypes. Identifying the full set of attractors is,therefore, of substantial biological interest. However, for conventionalhigh-performance computing, this problem is plagued by an exponential growth ofthe dynamic state space. Here, we demonstrate a novel quantum search algorithminspired by Grover's algorithm to be implemented on quantum computingplatforms. The algorithm performs an iterative suppression of states belongingto basins of previously discovered attractors from a uniform superposition,thus increasing the amplitudes of states in basins of yet unknown attractors.This approach guarantees that a new attractor state is measured with eachiteration of the algorithm, an optimization not currently achieved by any otheralgorithm in the literature. Tests of its resistance to noise have also shownpromising performance on devices from the current Noise Intermediate ScaleQuantum Computing (NISQ) era.

由于自由度极大且对微观细节的了解不全面，描述细胞环境中不断发生的基因相互作用是一项极具挑战性的任务。因此，布尔网络（Boolean Networks，BNs）为此类动力学提供了一个粗粒度且相当强大的模型。布尔网络是由布尔代理和它们在一段时间内可能发生的相互作用记录组成的动力系统。系统中的稳定状态称为吸引子，与生物表型的细胞表达密切相关。因此，识别全套吸引子具有重要的生物学意义。然而，对于传统的高性能计算来说，这个问题受到动态状态空间指数级增长的困扰。在这里，我们展示了一种受格罗弗算法启发的新型量子搜索算法，该算法可在量子计算平台上实现。该算法从均匀叠加中对属于先前发现的吸引子盆地的状态进行迭代抑制，从而增加未知吸引子盆地中状态的振幅。这种方法保证了算法的每次迭代都能测出一个新的吸引子状态，这是目前文献中其他算法无法实现的优化。对其抗噪声能力的测试也表明，它在当前噪声中等规模量子计算（NISQ）时代的设备上具有良好的性能。

{"title":"A Novel Quantum Algorithm for Efficient Attractor Search in Gene Regulatory Networks","authors":"Mirko Rossini, Felix M. Weidner, Joachim Ankerhold, Hans A. Kestler","doi":"arxiv-2408.08814","DOIUrl":"https://doi.org/arxiv-2408.08814","url":null,"abstract":"The description of gene interactions that constantly occur in the cellular\u0000environment is an extremely challenging task due to an immense number of\u0000degrees of freedom and incomplete knowledge about microscopic details. Hence, a\u0000coarse-grained and rather powerful modeling of such dynamics is provided by\u0000Boolean Networks (BNs). BNs are dynamical systems composed of Boolean agents\u0000and a record of their possible interactions over time. Stable states in these\u0000systems are called attractors which are closely related to the cellular\u0000expression of biological phenotypes. Identifying the full set of attractors is,\u0000therefore, of substantial biological interest. However, for conventional\u0000high-performance computing, this problem is plagued by an exponential growth of\u0000the dynamic state space. Here, we demonstrate a novel quantum search algorithm\u0000inspired by Grover's algorithm to be implemented on quantum computing\u0000platforms. The algorithm performs an iterative suppression of states belonging\u0000to basins of previously discovered attractors from a uniform superposition,\u0000thus increasing the amplitudes of states in basins of yet unknown attractors.\u0000This approach guarantees that a new attractor state is measured with each\u0000iteration of the algorithm, an optimization not currently achieved by any other\u0000algorithm in the literature. Tests of its resistance to noise have also shown\u0000promising performance on devices from the current Noise Intermediate Scale\u0000Quantum Computing (NISQ) era.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computational strategies for cross-species knowledge transfer and translational biomedicine 跨物种知识转移和转化生物医学的计算策略

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-16 DOI: arxiv-2408.08503

Hao Yuan, Christopher A. Mancuso, Kayla Johnson, Ingo Braasch, Arjun Krishnan

Research organisms provide invaluable insights into human biology anddiseases, serving as essential tools for functional experiments, diseasemodeling, and drug testing. However, evolutionary divergence between humans andresearch organisms hinders effective knowledge transfer across species. Here,we review state-of-the-art methods for computationally transferring knowledgeacross species, primarily focusing on methods that utilize transcriptome dataand/or molecular networks. We introduce the term "agnology" to describe thefunctional equivalence of molecular components regardless of evolutionaryorigin, as this concept is becoming pervasive in integrative data-driven modelswhere the role of evolutionary origin can become unclear. Our review addressesfour key areas of information and knowledge transfer across species: (1)transferring disease and gene annotation knowledge, (2) identifying agnologousmolecular components, (3) inferring equivalent perturbed genes or gene sets,and (4) identifying agnologous cell types. We conclude with an outlook onfuture directions and several key challenges that remain in cross-speciesknowledge transfer.

研究生物为人类生物学和疾病提供了宝贵的见解，是功能实验、疾病建模和药物测试的重要工具。然而，人类与研究生物之间的进化差异阻碍了跨物种知识的有效传递。在此，我们回顾了计算跨物种知识转移的最新方法，主要集中在利用转录组数据和/或分子网络的方法上。我们引入了 "生态学"（agnology）一词来描述分子成分的功能等同性，而不论其进化起源如何，因为这一概念在数据驱动的综合模型中正变得非常普遍，在这种模型中，进化起源的作用可能变得不明确。我们的综述涉及跨物种信息和知识转移的四个关键领域：(1) 转移疾病和基因注释知识，(2) 识别同源分子成分，(3) 推断等效扰动基因或基因组，以及 (4) 识别同源细胞类型。最后，我们展望了未来的发展方向以及跨物种知识转移仍面临的几个关键挑战。

{"title":"Computational strategies for cross-species knowledge transfer and translational biomedicine","authors":"Hao Yuan, Christopher A. Mancuso, Kayla Johnson, Ingo Braasch, Arjun Krishnan","doi":"arxiv-2408.08503","DOIUrl":"https://doi.org/arxiv-2408.08503","url":null,"abstract":"Research organisms provide invaluable insights into human biology and\u0000diseases, serving as essential tools for functional experiments, disease\u0000modeling, and drug testing. However, evolutionary divergence between humans and\u0000research organisms hinders effective knowledge transfer across species. Here,\u0000we review state-of-the-art methods for computationally transferring knowledge\u0000across species, primarily focusing on methods that utilize transcriptome data\u0000and/or molecular networks. We introduce the term \"agnology\" to describe the\u0000functional equivalence of molecular components regardless of evolutionary\u0000origin, as this concept is becoming pervasive in integrative data-driven models\u0000where the role of evolutionary origin can become unclear. Our review addresses\u0000four key areas of information and knowledge transfer across species: (1)\u0000transferring disease and gene annotation knowledge, (2) identifying agnologous\u0000molecular components, (3) inferring equivalent perturbed genes or gene sets,\u0000and (4) identifying agnologous cell types. We conclude with an outlook on\u0000future directions and several key challenges that remain in cross-species\u0000knowledge transfer.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Maximum entropy models for patterns of gene expression 基因表达模式的最大熵模型

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-15 DOI: arxiv-2408.08037

Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek

New experimental methods make it possible to measure the expression levels ofmany genes, simultaneously, in snapshots from thousands or even millions ofindividual cells. Current approaches to analyze these experiments involveclustering or low-dimensional projections. Here we use the principle of maximumentropy to obtain a probabilistic description that captures the observedpresence or absence of mRNAs from hundreds of genes in cells from the mammalianbrain. We construct the Ising model compatible with experimental means andpairwise correlations, and validate it by showing that it gives goodpredictions for higher-order statistics. We notice that the probabilitydistribution of cell states has many local maxima. By labeling cell statesaccording to the associated maximum, we obtain a cell classification thatagrees well with previous results that use traditional clustering techniques.Our results provide quantitative descriptions of gene expression statistics andinterpretable criteria for defining cell classes, supporting the hypothesisthat cell classes emerge from the collective interaction of gene expressionlevels.

新的实验方法使同时测量数千甚至数百万个单个细胞快照中许多基因的表达水平成为可能。目前分析这些实验的方法包括聚类或低维投影。在这里，我们利用最大熵原理获得了一种概率描述，它捕捉到了在哺乳动物脑细胞中观察到的数百个基因的 mRNA 的存在或不存在。我们构建了与实验均值和成对相关性兼容的伊辛模型，并通过证明它能很好地预测高阶统计量来验证它。我们注意到细胞状态的概率分布有许多局部最大值。我们的结果提供了基因表达统计的定量描述和可解释的细胞类别定义标准，支持了细胞类别产生于基因表达水平的集体相互作用这一假设。

{"title":"Maximum entropy models for patterns of gene expression","authors":"Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek","doi":"arxiv-2408.08037","DOIUrl":"https://doi.org/arxiv-2408.08037","url":null,"abstract":"New experimental methods make it possible to measure the expression levels of\u0000many genes, simultaneously, in snapshots from thousands or even millions of\u0000individual cells. Current approaches to analyze these experiments involve\u0000clustering or low-dimensional projections. Here we use the principle of maximum\u0000entropy to obtain a probabilistic description that captures the observed\u0000presence or absence of mRNAs from hundreds of genes in cells from the mammalian\u0000brain. We construct the Ising model compatible with experimental means and\u0000pairwise correlations, and validate it by showing that it gives good\u0000predictions for higher-order statistics. We notice that the probability\u0000distribution of cell states has many local maxima. By labeling cell states\u0000according to the associated maximum, we obtain a cell classification that\u0000agrees well with previous results that use traditional clustering techniques.\u0000Our results provide quantitative descriptions of gene expression statistics and\u0000interpretable criteria for defining cell classes, supporting the hypothesis\u0000that cell classes emerge from the collective interaction of gene expression\u0000levels.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The gene function prediction challenge: large language models and knowledge graphs to the rescue 基因功能预测挑战：大型语言模型和知识图谱的拯救

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-13 DOI: arxiv-2408.07222

Rohan Shawn Sunil, Shan Chun Lim, Manoj Itharajula, Marek Mutwil

Elucidating gene function is one of the ultimate goals of plant science.Despite this, only ~15% of all genes in the model plant Arabidopsis thalianahave comprehensively experimentally verified functions. While bioinformaticalgene function prediction approaches can guide biologists in their experimentalefforts, neither the performance of the gene function prediction methods northe number of experimental characterisation of genes has increased dramaticallyin recent years. In this review, we will discuss the status quo and thetrajectory of gene function elucidation and outline the recent advances in genefunction prediction approaches. We will then discuss how recent artificialintelligence advances in large language models and knowledge graphs can beleveraged to accelerate gene function predictions and keep us updated withscientific literature.

阐明基因功能是植物科学的终极目标之一。尽管如此，在模式植物拟南芥（Arabidopsis thalian）中，只有约 15% 的基因具有经过实验验证的全面功能。虽然生物信息学的基因功能预测方法可以指导生物学家的实验工作，但近年来基因功能预测方法的性能和基因实验表征的数量都没有显著提高。在这篇综述中，我们将讨论基因功能阐释的现状和发展轨迹，并概述基因功能预测方法的最新进展。然后，我们将讨论如何利用人工智能在大型语言模型和知识图谱方面的最新进展来加速基因功能预测，并使我们能及时了解科学文献的最新进展。

引用次数: 0

On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders 论网络及其应用：基因调控网络的稳定性和使用自动编码器预测基因功能

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-13 DOI: arxiv-2408.07064

Hamza Coban

We prove that nested canalizing functions are the minimum-sensitivity Booleanfunctions for any activity ratio and we determine the functional form of thisboundary which has a nontrivial fractal structure. We further observe that themajority of the gene regulatory functions found in known biological networks(submitted to the Cell Collective database) lie on the line of minimumsensitivity which paradoxically remains largely in the unstable regime. Ourresults provide a quantitative basis for the argument that an evolutionarypreference for nested canalizing functions in gene regulation (e.g., for higherrobustness) and for elasticity of gene activity are sufficient forconcentration of such systems near the "edge of chaos." The original structureof gene regulatory networks is unknown due to the undiscovered functions ofsome genes. Most gene function discovery approaches make use of unsupervisedclustering or classification methods that discover and exploit patterns in geneexpression profiles. However, existing knowledge in the field derives frommultiple and diverse sources. Incorporating this know-how for novel genefunction prediction can, therefore, be expected to improve such predictions. Wehere propose a function-specific novel gene discovery tool that uses asemi-supervised autoencoder. Our method is thus able to address the needs of amodern researcher whose expertise is typically confined to a specificfunctional domain. Lastly, the dynamics of unorthodox learning approaches likebiologically plausible learning algorithms are investigated and found toexhibit a general form of Einstein relation.

我们证明了嵌套调控函数是任何活性比的最小灵敏度布尔函数，并确定了这一边界的函数形式，它具有非难分形结构。我们进一步观察到，在已知生物网络（已提交至细胞集体数据库）中发现的大部分基因调控功能都位于最小灵敏度线上，而矛盾的是，这条线在很大程度上仍处于不稳定状态。我们的研究结果为以下论点提供了定量依据：在基因调控中，进化论偏好嵌套的渠化功能（例如，更高的稳健性）和基因活动的弹性足以使这类系统集中在 "混沌边缘 "附近。由于一些基因的功能尚未被发现，基因调控网络的原始结构尚不清楚。大多数基因功能发现方法都是利用无监督聚类或分类方法，发现并利用基因表达谱中的模式。然而，该领域的现有知识来源多种多样。因此，将这些知识用于新基因功能预测可望改进此类预测。在此，我们提出了一种使用半监督自动编码器的特定功能新型基因发现工具。因此，我们的方法能够满足现代研究人员的需求，他们的专业知识通常局限于特定的功能领域。最后，我们研究了生物学上可信的学习算法等非正统学习方法的动态，发现它们表现出一种一般形式的爱因斯坦关系。

{"title":"On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders","authors":"Hamza Coban","doi":"arxiv-2408.07064","DOIUrl":"https://doi.org/arxiv-2408.07064","url":null,"abstract":"We prove that nested canalizing functions are the minimum-sensitivity Boolean\u0000functions for any activity ratio and we determine the functional form of this\u0000boundary which has a nontrivial fractal structure. We further observe that the\u0000majority of the gene regulatory functions found in known biological networks\u0000(submitted to the Cell Collective database) lie on the line of minimum\u0000sensitivity which paradoxically remains largely in the unstable regime. Our\u0000results provide a quantitative basis for the argument that an evolutionary\u0000preference for nested canalizing functions in gene regulation (e.g., for higher\u0000robustness) and for elasticity of gene activity are sufficient for\u0000concentration of such systems near the \"edge of chaos.\" The original structure\u0000of gene regulatory networks is unknown due to the undiscovered functions of\u0000some genes. Most gene function discovery approaches make use of unsupervised\u0000clustering or classification methods that discover and exploit patterns in gene\u0000expression profiles. However, existing knowledge in the field derives from\u0000multiple and diverse sources. Incorporating this know-how for novel gene\u0000function prediction can, therefore, be expected to improve such predictions. We\u0000here propose a function-specific novel gene discovery tool that uses a\u0000semi-supervised autoencoder. Our method is thus able to address the needs of a\u0000modern researcher whose expertise is typically confined to a specific\u0000functional domain. Lastly, the dynamics of unorthodox learning approaches like\u0000biologically plausible learning algorithms are investigated and found to\u0000exhibit a general form of Einstein relation.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"54 38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovering Motifs to Fingerprint Multi-Layer Networks: a Case Study on the Connectome of C. Elegans 发现多层网络的指纹图案：关于 C. Elegans 连接组的案例研究

arXiv - QuanBio - Molecular Networks

Pub Date : 2024-08-09 DOI: arxiv-2408.13263

Deepak Sharma, Matthias Renz, Philipp Hövel

Motif discovery is a powerful approach to understanding network structuresand their function. We present a comprehensive analysis of regulatory motifs inthe connectome of the model organism Caenorhabditis elegans (C. elegans).Leveraging the Efficient Subgraph Counting Algorithmic PackagE (ESCAPE)algorithm, we identify network motifs in the multi-layer nervous system of C.elegans and link them to functional circuits. We further investigate motifenrichment within signal pathways and benchmark our findings with randomnetworks of similar size and link density. Our findings provide valuableinsights into the organization of the nerve net of this well documentedorganism and can be easily transferred to other species and disciplines alike.

发现基元是了解网络结构及其功能的有力方法。利用高效子图计数算法 PackagE（ESCAPE）算法，我们识别了秀丽隐杆线虫多层神经系统中的网络主题，并将它们与功能回路联系起来。我们进一步研究了信号通路中的主题富集，并将我们的研究结果与类似大小和链接密度的随机网络进行比较。我们的研究结果为了解这种有据可查的生物的神经网络组织提供了有价值的见解，并且可以很容易地应用到其他物种和学科中。

引用次数: 0