{"title":"The Genomic Code: The genome instantiates a generative model of the organism","authors":"Kevin J. Mitchell, Nick Cheney","doi":"arxiv-2407.15908","DOIUrl":null,"url":null,"abstract":"How does the genome encode the form of the organism? What is the nature of\nthis genomic code? Common metaphors, such as a blueprint or program, fail to\ncapture the complex, indirect, and evolutionarily dynamic relationship between\nthe genome and organismal form, or the constructive, interactive processes that\nproduce it. Such metaphors are also not readily formalised, either to treat\nempirical data or to simulate genomic encoding of form in silico. Here, we\npropose a new analogy, inspired by recent work in machine learning and\nneuroscience: that the genome encodes a generative model of the organism. In\nthis scheme, by analogy with variational autoencoders, the genome does not\nencode either organismal form or developmental processes directly, but\ncomprises a compressed space of latent variables. These latent variables are\nthe DNA sequences that specify the biochemical properties of encoded proteins\nand the relative affinities between trans-acting regulatory factors and their\ntarget sequence elements. Collectively, these comprise a connectionist network,\nwith weights that get encoded by the learning algorithm of evolution and\ndecoded through the processes of development. The latent variables collectively\nshape an energy landscape that constrains the self-organising processes of\ndevelopment so as to reliably produce a new individual of a certain type,\nproviding a direct analogy to Waddingtons famous epigenetic landscape. The\ngenerative model analogy accounts for the complex, distributed genetic\narchitecture of most traits and the emergent robustness and evolvability of\ndevelopmental processes. It also provides a new way to explain the independent\nselectability of specific traits, drawing on the idea of multiplexed\ndisentangled representations observed in artificial and neural systems and\nlends itself to formalisation.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.15908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
How does the genome encode the form of the organism? What is the nature of
this genomic code? Common metaphors, such as a blueprint or program, fail to
capture the complex, indirect, and evolutionarily dynamic relationship between
the genome and organismal form, or the constructive, interactive processes that
produce it. Such metaphors are also not readily formalised, either to treat
empirical data or to simulate genomic encoding of form in silico. Here, we
propose a new analogy, inspired by recent work in machine learning and
neuroscience: that the genome encodes a generative model of the organism. In
this scheme, by analogy with variational autoencoders, the genome does not
encode either organismal form or developmental processes directly, but
comprises a compressed space of latent variables. These latent variables are
the DNA sequences that specify the biochemical properties of encoded proteins
and the relative affinities between trans-acting regulatory factors and their
target sequence elements. Collectively, these comprise a connectionist network,
with weights that get encoded by the learning algorithm of evolution and
decoded through the processes of development. The latent variables collectively
shape an energy landscape that constrains the self-organising processes of
development so as to reliably produce a new individual of a certain type,
providing a direct analogy to Waddingtons famous epigenetic landscape. The
generative model analogy accounts for the complex, distributed genetic
architecture of most traits and the emergent robustness and evolvability of
developmental processes. It also provides a new way to explain the independent
selectability of specific traits, drawing on the idea of multiplexed
disentangled representations observed in artificial and neural systems and
lends itself to formalisation.
基因组如何编码生物体的形态?基因组代码的本质是什么?常见的隐喻,如蓝图或程序,无法概括基因组与生物体形态之间复杂、间接和动态进化的关系,也无法概括产生这种关系的建设性互动过程。无论是处理经验数据,还是模拟基因组对形态的编码,这些隐喻都不容易形式化。在此,我们受机器学习和神经科学领域最新研究的启发,提出了一个新的类比:基因组编码生物体的生成模型。在这个方案中,通过与变异自动编码器类比,基因组并不直接编码生物体的形态或发育过程,而是包含一个压缩的潜变量空间。这些潜变量是指定编码蛋白质生化特性的 DNA 序列,以及反式调节因子与其目标序列元素之间的相对亲和力。这些变量共同构成了一个联结网络,其权重由进化学习算法编码,并通过发育过程解码。这些潜在变量共同形成了一个能量景观,它制约着发育的自组织过程,从而可靠地产生出某种类型的新个体,这与韦丁顿著名的表观遗传景观形成了直接的类比。该生成模型类比解释了大多数性状复杂、分布式的遗传结构,以及发育过程中出现的稳健性和可演化性。它还提供了一种新的方法来解释特定性状的独立可选择性,借鉴了在人工和神经系统中观察到的多路复用分散表征的思想,并适合形式化。