Recent studies suggest that drug-drug interaction (DDI) prediction via computational approaches has significant importance for understanding the functions and co-prescriptions of multiple drugs. However, the existing silico DDI prediction methods either ignore the potential interactions among drug-drug pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature representations for better prediction. In this study, we propose RGDA-DDI, a residual graph attention network (residual-GAT) and dual-attention based framework for drug-drug interaction prediction. A residual-GAT module is introduced to simultaneously learn multi-scale feature representations from drugs and DDPs. In addition, a dual-attention based feature fusion block is constructed to learn local joint interaction representations. A series of evaluation metrics demonstrate that the RGDA-DDI significantly improved DDI prediction performance on two public benchmark datasets, which provides a new insight into drug development.
{"title":"RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction","authors":"Changjian Zhou, Xin Zhang, Jiafeng Li, Jia Song, Wensheng Xiang","doi":"arxiv-2408.15310","DOIUrl":"https://doi.org/arxiv-2408.15310","url":null,"abstract":"Recent studies suggest that drug-drug interaction (DDI) prediction via\u0000computational approaches has significant importance for understanding the\u0000functions and co-prescriptions of multiple drugs. However, the existing silico\u0000DDI prediction methods either ignore the potential interactions among drug-drug\u0000pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature\u0000representations for better prediction. In this study, we propose RGDA-DDI, a\u0000residual graph attention network (residual-GAT) and dual-attention based\u0000framework for drug-drug interaction prediction. A residual-GAT module is\u0000introduced to simultaneously learn multi-scale feature representations from\u0000drugs and DDPs. In addition, a dual-attention based feature fusion block is\u0000constructed to learn local joint interaction representations. A series of\u0000evaluation metrics demonstrate that the RGDA-DDI significantly improved DDI\u0000prediction performance on two public benchmark datasets, which provides a new\u0000insight into drug development.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"406 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Eduardo Hernández-García, Mariana Gómez-Schiavon, Jorge Velázquez-Castro
Fluctuations are inherent to biological systems, arising from the stochastic nature of molecular interactions, and influence various aspects of system behavior, stability, and robustness. These fluctuations can be categorized as intrinsic, stemming from the system's inherent structure and dynamics, and extrinsic, arising from external factors, such as temperature variations. Understanding the interplay between these fluctuations is crucial for obtaining a comprehensive understanding of biological phenomena. However, studying these effects poses significant computational challenges. In this study, we used an underexplored methodology to analyze the effect of extrinsic fluctuations in stochastic systems using ordinary differential equations instead of solving the Master Equation with stochastic parameters. By incorporating temperature fluctuations into reaction rates, we explored the impact of extrinsic factors on system dynamics. We constructed a master equation and calculated the equations for the dynamics of the first two moments, offering computational efficiency compared with directly solving the chemical master equation. We applied this approach to analyze a biological oscillator, focusing on the p53 model and its response to temperature-induced extrinsic fluctuations. Our findings underscore the impact of extrinsic fluctuations on the nature of oscillations in biological systems, with alterations in oscillatory behavior depending on the characteristics of extrinsic fluctuations. We observed an increased oscillation amplitude and frequency of the p53 concentration cycle. This study provides valuable insights into the effects of extrinsic fluctuations on biological oscillations and highlights the importance of considering them in more complex systems to prevent unwanted scenarios related to health issues.
{"title":"Extrinsic Fluctuations in the p53 Cycle","authors":"Manuel Eduardo Hernández-García, Mariana Gómez-Schiavon, Jorge Velázquez-Castro","doi":"arxiv-2408.12107","DOIUrl":"https://doi.org/arxiv-2408.12107","url":null,"abstract":"Fluctuations are inherent to biological systems, arising from the stochastic\u0000nature of molecular interactions, and influence various aspects of system\u0000behavior, stability, and robustness. These fluctuations can be categorized as\u0000intrinsic, stemming from the system's inherent structure and dynamics, and\u0000extrinsic, arising from external factors, such as temperature variations.\u0000Understanding the interplay between these fluctuations is crucial for obtaining\u0000a comprehensive understanding of biological phenomena. However, studying these\u0000effects poses significant computational challenges. In this study, we used an\u0000underexplored methodology to analyze the effect of extrinsic fluctuations in\u0000stochastic systems using ordinary differential equations instead of solving the\u0000Master Equation with stochastic parameters. By incorporating temperature\u0000fluctuations into reaction rates, we explored the impact of extrinsic factors\u0000on system dynamics. We constructed a master equation and calculated the\u0000equations for the dynamics of the first two moments, offering computational\u0000efficiency compared with directly solving the chemical master equation. We\u0000applied this approach to analyze a biological oscillator, focusing on the p53\u0000model and its response to temperature-induced extrinsic fluctuations. Our\u0000findings underscore the impact of extrinsic fluctuations on the nature of\u0000oscillations in biological systems, with alterations in oscillatory behavior\u0000depending on the characteristics of extrinsic fluctuations. We observed an\u0000increased oscillation amplitude and frequency of the p53 concentration cycle.\u0000This study provides valuable insights into the effects of extrinsic\u0000fluctuations on biological oscillations and highlights the importance of\u0000considering them in more complex systems to prevent unwanted scenarios related\u0000to health issues.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin
We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.
{"title":"Active learning of digenic functions with boolean matrix logic programming","authors":"Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin","doi":"arxiv-2408.14487","DOIUrl":"https://doi.org/arxiv-2408.14487","url":null,"abstract":"We apply logic-based machine learning techniques to facilitate cellular\u0000engineering and drive biological discovery, based on comprehensive databases of\u0000metabolic processes called genome-scale metabolic network models (GEMs).\u0000Predicted host behaviours are not always correctly described by GEMs. Learning\u0000the intricate genetic interactions within GEMs presents computational and\u0000empirical challenges. To address these, we describe a novel approach called\u0000Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to\u0000evaluate large logic programs. We introduce a new system, $BMLP_{active}$,\u0000which efficiently explores the genomic hypothesis space by guiding informative\u0000experimentation through active learning. In contrast to sub-symbolic methods,\u0000$BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial\u0000host in an interpretable and logical representation using datalog logic\u0000programs. Notably, $BMLP_{active}$ can successfully learn the interaction\u0000between a gene pair with fewer training examples than random experimentation,\u0000overcoming the increase in experimental design space. $BMLP_{active}$ enables\u0000rapid optimisation of metabolic models and offers a realistic approach to a\u0000self-driving lab for microbial engineering.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The biological targets of traditional Chinese medicine (TCM) are the core effectors mediating the interaction between TCM and the human body. Identification of TCM targets is essential to elucidate the chemical basis and mechanisms of TCM for treating diseases. Given the chemical complexity of TCM, both in silico high-throughput drug-target interaction predicting models and biological profile-based methods have been commonly applied for identifying TCM targets based on the structural information of TCM chemical components and biological information, respectively. However, the existing methods lack the integration of TCM chemical and biological information, resulting in difficulty in the systematic discovery of TCM action pathways. To solve this problem, we propose a novel target identification model NP-TCMtarget to explore the TCM target path by combining the overall chemical and biological profiles. First, NP-TCMtarget infers TCM effect targets by calculating associations between drug/disease inducible gene expression profiles and specific gene signatures for 8,233 targets. Then, NP-TCMtarget utilizes a constructed binary classification model to predict binding targets of herbal ingredients. Finally, we can distinguish TCM direct and indirect targets by comparing the effect targets and binding targets to establish the action pathways of herbal components-direct targets-indirect targets by mapping TCM targets in the biological molecular network. We apply NP-TCMtarget to the formula XiaoKeAn to demonstrate the power of revealing the action pathways of herbal formula. We expect that this novel model could provide a systematic framework for exploring the molecular mechanisms of TCM at the target level. NP-TCMtarget is available at http://www.bcxnfz.top/NP-TCMtarget.
{"title":"NP-TCMtarget: a network pharmacology platform for exploring mechanisms of action of Traditional Chinese medicine","authors":"Aoyi Wang, Yingdong Wang, Haoyang Peng, Haoran Zhang, Caiping Cheng, Jinzhong Zhao, Wuxia Zhang, Jianxin Chen, Peng Li","doi":"arxiv-2408.09142","DOIUrl":"https://doi.org/arxiv-2408.09142","url":null,"abstract":"The biological targets of traditional Chinese medicine (TCM) are the core\u0000effectors mediating the interaction between TCM and the human body.\u0000Identification of TCM targets is essential to elucidate the chemical basis and\u0000mechanisms of TCM for treating diseases. Given the chemical complexity of TCM,\u0000both in silico high-throughput drug-target interaction predicting models and\u0000biological profile-based methods have been commonly applied for identifying TCM\u0000targets based on the structural information of TCM chemical components and\u0000biological information, respectively. However, the existing methods lack the\u0000integration of TCM chemical and biological information, resulting in difficulty\u0000in the systematic discovery of TCM action pathways. To solve this problem, we\u0000propose a novel target identification model NP-TCMtarget to explore the TCM\u0000target path by combining the overall chemical and biological profiles. First,\u0000NP-TCMtarget infers TCM effect targets by calculating associations between\u0000drug/disease inducible gene expression profiles and specific gene signatures\u0000for 8,233 targets. Then, NP-TCMtarget utilizes a constructed binary\u0000classification model to predict binding targets of herbal ingredients. Finally,\u0000we can distinguish TCM direct and indirect targets by comparing the effect\u0000targets and binding targets to establish the action pathways of herbal\u0000components-direct targets-indirect targets by mapping TCM targets in the\u0000biological molecular network. We apply NP-TCMtarget to the formula XiaoKeAn to\u0000demonstrate the power of revealing the action pathways of herbal formula. We\u0000expect that this novel model could provide a systematic framework for exploring\u0000the molecular mechanisms of TCM at the target level. NP-TCMtarget is available\u0000at http://www.bcxnfz.top/NP-TCMtarget.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mirko Rossini, Felix M. Weidner, Joachim Ankerhold, Hans A. Kestler
The description of gene interactions that constantly occur in the cellular environment is an extremely challenging task due to an immense number of degrees of freedom and incomplete knowledge about microscopic details. Hence, a coarse-grained and rather powerful modeling of such dynamics is provided by Boolean Networks (BNs). BNs are dynamical systems composed of Boolean agents and a record of their possible interactions over time. Stable states in these systems are called attractors which are closely related to the cellular expression of biological phenotypes. Identifying the full set of attractors is, therefore, of substantial biological interest. However, for conventional high-performance computing, this problem is plagued by an exponential growth of the dynamic state space. Here, we demonstrate a novel quantum search algorithm inspired by Grover's algorithm to be implemented on quantum computing platforms. The algorithm performs an iterative suppression of states belonging to basins of previously discovered attractors from a uniform superposition, thus increasing the amplitudes of states in basins of yet unknown attractors. This approach guarantees that a new attractor state is measured with each iteration of the algorithm, an optimization not currently achieved by any other algorithm in the literature. Tests of its resistance to noise have also shown promising performance on devices from the current Noise Intermediate Scale Quantum Computing (NISQ) era.
{"title":"A Novel Quantum Algorithm for Efficient Attractor Search in Gene Regulatory Networks","authors":"Mirko Rossini, Felix M. Weidner, Joachim Ankerhold, Hans A. Kestler","doi":"arxiv-2408.08814","DOIUrl":"https://doi.org/arxiv-2408.08814","url":null,"abstract":"The description of gene interactions that constantly occur in the cellular\u0000environment is an extremely challenging task due to an immense number of\u0000degrees of freedom and incomplete knowledge about microscopic details. Hence, a\u0000coarse-grained and rather powerful modeling of such dynamics is provided by\u0000Boolean Networks (BNs). BNs are dynamical systems composed of Boolean agents\u0000and a record of their possible interactions over time. Stable states in these\u0000systems are called attractors which are closely related to the cellular\u0000expression of biological phenotypes. Identifying the full set of attractors is,\u0000therefore, of substantial biological interest. However, for conventional\u0000high-performance computing, this problem is plagued by an exponential growth of\u0000the dynamic state space. Here, we demonstrate a novel quantum search algorithm\u0000inspired by Grover's algorithm to be implemented on quantum computing\u0000platforms. The algorithm performs an iterative suppression of states belonging\u0000to basins of previously discovered attractors from a uniform superposition,\u0000thus increasing the amplitudes of states in basins of yet unknown attractors.\u0000This approach guarantees that a new attractor state is measured with each\u0000iteration of the algorithm, an optimization not currently achieved by any other\u0000algorithm in the literature. Tests of its resistance to noise have also shown\u0000promising performance on devices from the current Noise Intermediate Scale\u0000Quantum Computing (NISQ) era.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Yuan, Christopher A. Mancuso, Kayla Johnson, Ingo Braasch, Arjun Krishnan
Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. We introduce the term "agnology" to describe the functional equivalence of molecular components regardless of evolutionary origin, as this concept is becoming pervasive in integrative data-driven models where the role of evolutionary origin can become unclear. Our review addresses four key areas of information and knowledge transfer across species: (1) transferring disease and gene annotation knowledge, (2) identifying agnologous molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying agnologous cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer.
{"title":"Computational strategies for cross-species knowledge transfer and translational biomedicine","authors":"Hao Yuan, Christopher A. Mancuso, Kayla Johnson, Ingo Braasch, Arjun Krishnan","doi":"arxiv-2408.08503","DOIUrl":"https://doi.org/arxiv-2408.08503","url":null,"abstract":"Research organisms provide invaluable insights into human biology and\u0000diseases, serving as essential tools for functional experiments, disease\u0000modeling, and drug testing. However, evolutionary divergence between humans and\u0000research organisms hinders effective knowledge transfer across species. Here,\u0000we review state-of-the-art methods for computationally transferring knowledge\u0000across species, primarily focusing on methods that utilize transcriptome data\u0000and/or molecular networks. We introduce the term \"agnology\" to describe the\u0000functional equivalence of molecular components regardless of evolutionary\u0000origin, as this concept is becoming pervasive in integrative data-driven models\u0000where the role of evolutionary origin can become unclear. Our review addresses\u0000four key areas of information and knowledge transfer across species: (1)\u0000transferring disease and gene annotation knowledge, (2) identifying agnologous\u0000molecular components, (3) inferring equivalent perturbed genes or gene sets,\u0000and (4) identifying agnologous cell types. We conclude with an outlook on\u0000future directions and several key challenges that remain in cross-species\u0000knowledge transfer.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek
New experimental methods make it possible to measure the expression levels of many genes, simultaneously, in snapshots from thousands or even millions of individual cells. Current approaches to analyze these experiments involve clustering or low-dimensional projections. Here we use the principle of maximum entropy to obtain a probabilistic description that captures the observed presence or absence of mRNAs from hundreds of genes in cells from the mammalian brain. We construct the Ising model compatible with experimental means and pairwise correlations, and validate it by showing that it gives good predictions for higher-order statistics. We notice that the probability distribution of cell states has many local maxima. By labeling cell states according to the associated maximum, we obtain a cell classification that agrees well with previous results that use traditional clustering techniques. Our results provide quantitative descriptions of gene expression statistics and interpretable criteria for defining cell classes, supporting the hypothesis that cell classes emerge from the collective interaction of gene expression levels.
{"title":"Maximum entropy models for patterns of gene expression","authors":"Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek","doi":"arxiv-2408.08037","DOIUrl":"https://doi.org/arxiv-2408.08037","url":null,"abstract":"New experimental methods make it possible to measure the expression levels of\u0000many genes, simultaneously, in snapshots from thousands or even millions of\u0000individual cells. Current approaches to analyze these experiments involve\u0000clustering or low-dimensional projections. Here we use the principle of maximum\u0000entropy to obtain a probabilistic description that captures the observed\u0000presence or absence of mRNAs from hundreds of genes in cells from the mammalian\u0000brain. We construct the Ising model compatible with experimental means and\u0000pairwise correlations, and validate it by showing that it gives good\u0000predictions for higher-order statistics. We notice that the probability\u0000distribution of cell states has many local maxima. By labeling cell states\u0000according to the associated maximum, we obtain a cell classification that\u0000agrees well with previous results that use traditional clustering techniques.\u0000Our results provide quantitative descriptions of gene expression statistics and\u0000interpretable criteria for defining cell classes, supporting the hypothesis\u0000that cell classes emerge from the collective interaction of gene expression\u0000levels.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohan Shawn Sunil, Shan Chun Lim, Manoj Itharajula, Marek Mutwil
Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ~15% of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterisation of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.
{"title":"The gene function prediction challenge: large language models and knowledge graphs to the rescue","authors":"Rohan Shawn Sunil, Shan Chun Lim, Manoj Itharajula, Marek Mutwil","doi":"arxiv-2408.07222","DOIUrl":"https://doi.org/arxiv-2408.07222","url":null,"abstract":"Elucidating gene function is one of the ultimate goals of plant science.\u0000Despite this, only ~15% of all genes in the model plant Arabidopsis thaliana\u0000have comprehensively experimentally verified functions. While bioinformatical\u0000gene function prediction approaches can guide biologists in their experimental\u0000efforts, neither the performance of the gene function prediction methods nor\u0000the number of experimental characterisation of genes has increased dramatically\u0000in recent years. In this review, we will discuss the status quo and the\u0000trajectory of gene function elucidation and outline the recent advances in gene\u0000function prediction approaches. We will then discuss how recent artificial\u0000intelligence advances in large language models and knowledge graphs can be\u0000leveraged to accelerate gene function predictions and keep us updated with\u0000scientific literature.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We prove that nested canalizing functions are the minimum-sensitivity Boolean functions for any activity ratio and we determine the functional form of this boundary which has a nontrivial fractal structure. We further observe that the majority of the gene regulatory functions found in known biological networks (submitted to the Cell Collective database) lie on the line of minimum sensitivity which paradoxically remains largely in the unstable regime. Our results provide a quantitative basis for the argument that an evolutionary preference for nested canalizing functions in gene regulation (e.g., for higher robustness) and for elasticity of gene activity are sufficient for concentration of such systems near the "edge of chaos." The original structure of gene regulatory networks is unknown due to the undiscovered functions of some genes. Most gene function discovery approaches make use of unsupervised clustering or classification methods that discover and exploit patterns in gene expression profiles. However, existing knowledge in the field derives from multiple and diverse sources. Incorporating this know-how for novel gene function prediction can, therefore, be expected to improve such predictions. We here propose a function-specific novel gene discovery tool that uses a semi-supervised autoencoder. Our method is thus able to address the needs of a modern researcher whose expertise is typically confined to a specific functional domain. Lastly, the dynamics of unorthodox learning approaches like biologically plausible learning algorithms are investigated and found to exhibit a general form of Einstein relation.
{"title":"On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders","authors":"Hamza Coban","doi":"arxiv-2408.07064","DOIUrl":"https://doi.org/arxiv-2408.07064","url":null,"abstract":"We prove that nested canalizing functions are the minimum-sensitivity Boolean\u0000functions for any activity ratio and we determine the functional form of this\u0000boundary which has a nontrivial fractal structure. We further observe that the\u0000majority of the gene regulatory functions found in known biological networks\u0000(submitted to the Cell Collective database) lie on the line of minimum\u0000sensitivity which paradoxically remains largely in the unstable regime. Our\u0000results provide a quantitative basis for the argument that an evolutionary\u0000preference for nested canalizing functions in gene regulation (e.g., for higher\u0000robustness) and for elasticity of gene activity are sufficient for\u0000concentration of such systems near the \"edge of chaos.\" The original structure\u0000of gene regulatory networks is unknown due to the undiscovered functions of\u0000some genes. Most gene function discovery approaches make use of unsupervised\u0000clustering or classification methods that discover and exploit patterns in gene\u0000expression profiles. However, existing knowledge in the field derives from\u0000multiple and diverse sources. Incorporating this know-how for novel gene\u0000function prediction can, therefore, be expected to improve such predictions. We\u0000here propose a function-specific novel gene discovery tool that uses a\u0000semi-supervised autoencoder. Our method is thus able to address the needs of a\u0000modern researcher whose expertise is typically confined to a specific\u0000functional domain. Lastly, the dynamics of unorthodox learning approaches like\u0000biologically plausible learning algorithms are investigated and found to\u0000exhibit a general form of Einstein relation.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"54 38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motif discovery is a powerful approach to understanding network structures and their function. We present a comprehensive analysis of regulatory motifs in the connectome of the model organism Caenorhabditis elegans (C. elegans). Leveraging the Efficient Subgraph Counting Algorithmic PackagE (ESCAPE) algorithm, we identify network motifs in the multi-layer nervous system of C. elegans and link them to functional circuits. We further investigate motif enrichment within signal pathways and benchmark our findings with random networks of similar size and link density. Our findings provide valuable insights into the organization of the nerve net of this well documented organism and can be easily transferred to other species and disciplines alike.
{"title":"Discovering Motifs to Fingerprint Multi-Layer Networks: a Case Study on the Connectome of C. Elegans","authors":"Deepak Sharma, Matthias Renz, Philipp Hövel","doi":"arxiv-2408.13263","DOIUrl":"https://doi.org/arxiv-2408.13263","url":null,"abstract":"Motif discovery is a powerful approach to understanding network structures\u0000and their function. We present a comprehensive analysis of regulatory motifs in\u0000the connectome of the model organism Caenorhabditis elegans (C. elegans).\u0000Leveraging the Efficient Subgraph Counting Algorithmic PackagE (ESCAPE)\u0000algorithm, we identify network motifs in the multi-layer nervous system of C.\u0000elegans and link them to functional circuits. We further investigate motif\u0000enrichment within signal pathways and benchmark our findings with random\u0000networks of similar size and link density. Our findings provide valuable\u0000insights into the organization of the nerve net of this well documented\u0000organism and can be easily transferred to other species and disciplines alike.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"3 6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}