Amit Kahana, Alasdair MacLeod, Hessam Mehr, Abhishek Sharma, Emma Carrick, Michael Jirasek, Sara Walker, Leroy Cronin
{"title":"Constructing the Molecular Tree of Life using Assembly Theory and Mass Spectrometry","authors":"Amit Kahana, Alasdair MacLeod, Hessam Mehr, Abhishek Sharma, Emma Carrick, Michael Jirasek, Sara Walker, Leroy Cronin","doi":"arxiv-2408.09305","DOIUrl":null,"url":null,"abstract":"Here we demonstrate the first biochemistry-agnostic approach to map\nevolutionary relationships at the molecular scale, allowing the construction of\nphylogenetic models using mass spectrometry (MS) and Assembly Theory (AT)\nwithout elucidating molecular identities. AT allows us to estimate the\ncomplexity of molecules by deducing the amount of shared information stored\nwithin them when . By examining 74 samples from a diverse range of biotic and\nabiotic sources, we used tandem MS data to detect 24102 analytes (9262 unique)\nand 59518 molecular fragments (6755 unique). Using this MS dataset, together\nwith AT, we were able to infer the joint assembly spaces (JAS) of samples from\nmolecular analytes. We show how JAS allows agnostic annotation of samples\nwithout fingerprinting exact analyte identities, facilitating accurate\ndetermination of their biogenicity and taxonomical grouping. Furthermore, we\ndeveloped an AT-based framework to construct a biochemistry-agnostic\nphylogenetic tree which is consistent with genome-based models and outperforms\nother similarity-based algorithms. Finally, we were able to use AT to track\ncolony lineages of a single bacterial species based on phenotypic variation in\ntheir molecular composition with high accuracy, which would be challenging to\ntrack with genomic data. Our results demonstrate how AT can expand causal\nmolecular inference to non-sequence information without requiring exact\nmolecular identities, thereby opening the possibility to study previously\ninaccessible biological domains.","PeriodicalId":501044,"journal":{"name":"arXiv - QuanBio - Populations and Evolution","volume":"66 2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Populations and Evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Here we demonstrate the first biochemistry-agnostic approach to map
evolutionary relationships at the molecular scale, allowing the construction of
phylogenetic models using mass spectrometry (MS) and Assembly Theory (AT)
without elucidating molecular identities. AT allows us to estimate the
complexity of molecules by deducing the amount of shared information stored
within them when . By examining 74 samples from a diverse range of biotic and
abiotic sources, we used tandem MS data to detect 24102 analytes (9262 unique)
and 59518 molecular fragments (6755 unique). Using this MS dataset, together
with AT, we were able to infer the joint assembly spaces (JAS) of samples from
molecular analytes. We show how JAS allows agnostic annotation of samples
without fingerprinting exact analyte identities, facilitating accurate
determination of their biogenicity and taxonomical grouping. Furthermore, we
developed an AT-based framework to construct a biochemistry-agnostic
phylogenetic tree which is consistent with genome-based models and outperforms
other similarity-based algorithms. Finally, we were able to use AT to track
colony lineages of a single bacterial species based on phenotypic variation in
their molecular composition with high accuracy, which would be challenging to
track with genomic data. Our results demonstrate how AT can expand causal
molecular inference to non-sequence information without requiring exact
molecular identities, thereby opening the possibility to study previously
inaccessible biological domains.
在这里,我们展示了第一种在分子尺度上绘制进化关系图的生化无关方法,这种方法允许在不阐明分子特征的情况下利用质谱法(MS)和组装理论(AT)构建系统发育模型。组装理论允许我们通过推断分子中存储的共享信息量来估计分子的复杂性。通过研究来自不同生物和非生物来源的 74 个样本,我们使用串联质谱数据检测到了 24102 个分析物(9262 个唯一)和 59518 个分子片段(6755 个唯一)。利用该 MS 数据集和 AT,我们能够从分子分析物推断出样本的联合组装空间(JAS)。我们展示了联合组装空间如何在不对分析物的确切身份进行指纹识别的情况下对样品进行不可知的注释,从而有助于准确确定样品的生物属性和分类分组。此外,我们还开发了一个基于 AT 的框架,用于构建生化不可知论的系统发生树,该树与基于基因组的模型一致,并优于其他基于相似性的算法。最后,我们能够利用 AT 根据细菌分子组成的表型变化,高精度地追踪单个细菌物种的菌落谱系,而利用基因组数据追踪菌落谱系则具有挑战性。我们的研究结果表明了 AT 如何在不要求精确分子特征的情况下将因果分子推断扩展到非序列信息,从而为研究以前难以触及的生物领域提供了可能性。