首页 > 最新文献

Genome Biology and Evolution最新文献

英文 中文
Comparative Genomics of the World's Smallest Mammals Reveals Links to Echolocation, Metabolism, and Body Size Plasticity. 世界上最小哺乳动物的比较基因组学揭示了回声定位、新陈代谢和体型可塑性之间的联系。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae225
Marie-Laurence Cossette, Donald T Stewart, Aaron B A Shafer

Originating 30 million years ago, shrews (Soricidae) have diversified into around 400 species worldwide. Shrews display a wide array of adaptations, with some species having developed distinctive traits such as echolocation, underwater diving, and venomous saliva. Accordingly, these tiny insectivores are ideal to study the genomic mechanisms of evolution and adaptation. We conducted a comparative genomic analysis of four shrew species and 16 other mammals to identify genomic variations unique to shrews. Using two existing shrew genomes and two de novo assemblies for the maritime (Sorex maritimensis) and smoky (Sorex fumeus) shrews, we identified mutations in conserved regions of the genomes, also known as accelerated regions, gene families that underwent significant expansion, and positively selected genes. Our analyses unveiled shrew-specific genomic variants in genes associated with the nervous, metabolic, and auditory systems, which can be linked to unique traits in shrews. Notably, genes suggested to be under convergent evolution in echolocating mammals exhibited accelerated regions in shrews, and pathways linked to putative body size plasticity were detected. These findings provide insight into the evolutionary mechanisms shaping shrew species, shedding light on their adaptation and divergence over time.

鼩鼱(Soricidae)起源于 3000 万年前,目前在全球已繁衍成约 400 个物种。鼩鼱具有广泛的适应性,有些物种已经发展出回声定位、水下潜水和有毒唾液等独特的特征。因此,这些微小的食虫动物是研究进化和适应的基因组机制的理想对象。我们对四个鼩鼱物种和 16 种其他哺乳动物进行了基因组比较分析,以确定鼩鼱特有的基因组变异。我们利用现有的两个鼩鼱基因组以及海洋鼩鼱(Sorex maritimensis)和烟鼩鼱(S. fumeus)的两个全新的基因组,确定了基因组中保守区域(也称为加速区域)的突变、发生显著扩展的基因家族以及正选基因。我们的分析揭示了与神经、代谢和听觉系统相关的基因中鼩鼱特有的基因组变异,这些变异可能与鼩鼱的独特性状有关。值得注意的是,在回声定位哺乳动物中被认为处于趋同进化过程中的基因在鼩鼱中表现出加速区域,并且发现了与假定的体型可塑性相关的通路。这些发现让我们深入了解了塑造鼩鼱物种的进化机制,揭示了它们随着时间的推移而产生的适应和分化。
{"title":"Comparative Genomics of the World's Smallest Mammals Reveals Links to Echolocation, Metabolism, and Body Size Plasticity.","authors":"Marie-Laurence Cossette, Donald T Stewart, Aaron B A Shafer","doi":"10.1093/gbe/evae225","DOIUrl":"10.1093/gbe/evae225","url":null,"abstract":"<p><p>Originating 30 million years ago, shrews (Soricidae) have diversified into around 400 species worldwide. Shrews display a wide array of adaptations, with some species having developed distinctive traits such as echolocation, underwater diving, and venomous saliva. Accordingly, these tiny insectivores are ideal to study the genomic mechanisms of evolution and adaptation. We conducted a comparative genomic analysis of four shrew species and 16 other mammals to identify genomic variations unique to shrews. Using two existing shrew genomes and two de novo assemblies for the maritime (Sorex maritimensis) and smoky (Sorex fumeus) shrews, we identified mutations in conserved regions of the genomes, also known as accelerated regions, gene families that underwent significant expansion, and positively selected genes. Our analyses unveiled shrew-specific genomic variants in genes associated with the nervous, metabolic, and auditory systems, which can be linked to unique traits in shrews. Notably, genes suggested to be under convergent evolution in echolocating mammals exhibited accelerated regions in shrews, and pathways linked to putative body size plasticity were detected. These findings provide insight into the evolutionary mechanisms shaping shrew species, shedding light on their adaptation and divergence over time.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11544316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142463295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive Review on Plant Cytochrome P450 Evolution: Copy Number, Diversity, and Motif Analysis From Chlorophyta to Dicotyledoneae. 植物细胞色素 P450 进化综述:从叶绿体到双子叶植物的拷贝数、多样性和动因分析。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae240
Yuanpeng Fang, Zheng Tai, Keyi Hu, Lingfeng Luo, Sanwei Yang, Mengmeng Liu, Xin Xie

Cytochrome P450 enzymes (CYPs) are widely distributed among various plant groups and constitute approximately 1% of the total number of protein-coding genes. Extensive studies suggest that CYPs are involved in nearly all molecular processes that occur in plants. Over the past two decades, the identification of CYP genes has expanded rapidly, with more than 40,000 CYP genes and 819 CYP families being discovered. Copy number variation is a significant evolutionary characteristic of gene families, yet a systematic characterization of the copy evolution patterns in plant CYP gene families has been lacking, resulting in confusion and challenges in understanding CYP functions. To address these concerns, this review provides comprehensive statistics and analyses of the copy number and diversity of almost all plant CYP gene families, focusing on CYP evolution from Chlorophyta to Dicotyledoneae. Additionally, we examined the subfamily characteristics of certain CYP families with restricted copy changes and identified several CYP subfamilies that play pivotal roles in this event. Furthermore, we analyzed the structural conservation of CYPs across different taxa and compiled a comprehensive database to support plant CYP studies. Our analysis revealed differences in the six core conserved motifs of plant CYP proteins among various clans and plant taxa, while demonstrating similar conservation patterns for the ERR (glutamic acid-arginine-arginine) triad motifs. These findings will significantly facilitate the understanding of plant CYP gene evolution and metabolic diversity and serve as a valuable reference for researchers studying CYP enzymes.

细胞色素 P450 酶(CYPs)广泛分布于各种植物类群中,约占蛋白质编码基因总数的 1%。大量研究表明,CYPs 几乎参与了植物体内发生的所有分子过程。在过去二十年中,CYP 基因的鉴定范围迅速扩大,目前已发现 40,000 多个 CYP 基因和 819 个 CYP 家族。拷贝数变异是基因家族的一个重要进化特征,但一直缺乏对植物 CYP 基因家族拷贝进化模式的系统表征,这给理解 CYP 功能带来了困惑和挑战。为了解决这些问题,本综述对几乎所有植物 CYP 基因家族的拷贝数和多样性进行了全面的统计和分析,重点关注从叶绿体到双子叶植物的 CYP 演化。此外,我们还研究了某些拷贝变化受限的 CYP 家族的亚家族特征,并确定了在这一事件中发挥关键作用的几个 CYP 亚家族。此外,我们还分析了不同类群中 CYPs 的结构保持情况,并编制了一个支持植物 CYP 研究的综合数据库。我们的分析表明,植物 CYP 蛋白的六个核心保守基团在不同类群和植物类群之间存在差异,而ERR(谷氨酸-精氨酸-精氨酸)三基团则表现出相似的保守模式。这些发现将极大地促进对植物 CYP 基因进化和代谢多样性的理解,并为研究 CYP 酶的科研人员提供宝贵的参考。
{"title":"Comprehensive Review on Plant Cytochrome P450 Evolution: Copy Number, Diversity, and Motif Analysis From Chlorophyta to Dicotyledoneae.","authors":"Yuanpeng Fang, Zheng Tai, Keyi Hu, Lingfeng Luo, Sanwei Yang, Mengmeng Liu, Xin Xie","doi":"10.1093/gbe/evae240","DOIUrl":"10.1093/gbe/evae240","url":null,"abstract":"<p><p>Cytochrome P450 enzymes (CYPs) are widely distributed among various plant groups and constitute approximately 1% of the total number of protein-coding genes. Extensive studies suggest that CYPs are involved in nearly all molecular processes that occur in plants. Over the past two decades, the identification of CYP genes has expanded rapidly, with more than 40,000 CYP genes and 819 CYP families being discovered. Copy number variation is a significant evolutionary characteristic of gene families, yet a systematic characterization of the copy evolution patterns in plant CYP gene families has been lacking, resulting in confusion and challenges in understanding CYP functions. To address these concerns, this review provides comprehensive statistics and analyses of the copy number and diversity of almost all plant CYP gene families, focusing on CYP evolution from Chlorophyta to Dicotyledoneae. Additionally, we examined the subfamily characteristics of certain CYP families with restricted copy changes and identified several CYP subfamilies that play pivotal roles in this event. Furthermore, we analyzed the structural conservation of CYPs across different taxa and compiled a comprehensive database to support plant CYP studies. Our analysis revealed differences in the six core conserved motifs of plant CYP proteins among various clans and plant taxa, while demonstrating similar conservation patterns for the ERR (glutamic acid-arginine-arginine) triad motifs. These findings will significantly facilitate the understanding of plant CYP gene evolution and metabolic diversity and serve as a valuable reference for researchers studying CYP enzymes.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequence-Based Machine Learning Reveals 3D Genome Differences between Bonobos and Chimpanzees. 基于序列的机器学习揭示了倭黑猩猩和黑猩猩基因组的三维差异。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae210
Colin M Brand, Shuzhen Kuang, Erin N Gilbertson, Evonne McArthur, Katherine S Pollard, Timothy H Webster, John A Capra

The 3D structure of the genome is an important mediator of gene expression. As phenotypic divergence is largely driven by gene regulatory variation, comparing genome 3D contacts across species can further understanding of the molecular basis of species differences. However, while experimental data on genome 3D contacts in humans are increasingly abundant, only a handful of 3D genome contact maps exist for other species. Here, we demonstrate that human experimental data can be used to close this data gap. We apply a machine learning model that predicts 3D genome contacts from DNA sequence to the genomes from 56 bonobos and chimpanzees and identify species-specific patterns of genome folding. We estimated 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows, of which ∼17% were substantially divergent in predicted genome contacts. Bonobos and chimpanzees diverged at 89 windows, overlapping genes associated with multiple traits implicated in Pan phenotypic divergence. We discovered 51 bonobo-specific variants that individually produce the observed bonobo contact pattern in bonobo-chimpanzee divergent windows. Our results demonstrate that machine learning methods can leverage human data to fill in data gaps across species, offering the first look at population-level 3D genome variation in nonhuman primates. We also identify loci where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.

基因组的三维结构是基因表达的重要媒介。由于表型差异主要是由基因调控变异驱动的,因此比较不同物种的基因组三维接触可以进一步了解物种差异的分子基础。然而,虽然人类基因组三维接触的实验数据越来越丰富,但其他物种的基因组三维接触图却寥寥无几。在这里,我们证明人类实验数据可用于填补这一数据空白。我们在 56 只倭黑猩猩和黑猩猩的基因组中应用了一种机器学习模型,该模型可通过 DNA 序列预测基因组的三维接触,并识别出基因组折叠的物种特异性模式。我们在 4,420 个 1 Mb 的基因组窗口中,根据所得到的接触图估计了个体间的三维差异,其中有 17% 的个体在预测的基因组接触中存在实质性差异。倭黑猩猩和黑猩猩在89个窗口存在差异,与泛型表型差异相关的多个性状的基因重叠。我们发现了 51 个倭黑猩猩特异变体,这些变体在倭黑猩猩与黑猩猩的分歧窗口中单独产生了观察到的倭黑猩猩接触模式。我们的研究结果表明,机器学习方法可以利用人类数据来填补物种间的数据空白,首次揭示了非人灵长类动物种群水平的三维基因组变异。我们还确定了三维折叠变化可能导致我们近亲的表型差异的位点。
{"title":"Sequence-Based Machine Learning Reveals 3D Genome Differences between Bonobos and Chimpanzees.","authors":"Colin M Brand, Shuzhen Kuang, Erin N Gilbertson, Evonne McArthur, Katherine S Pollard, Timothy H Webster, John A Capra","doi":"10.1093/gbe/evae210","DOIUrl":"10.1093/gbe/evae210","url":null,"abstract":"<p><p>The 3D structure of the genome is an important mediator of gene expression. As phenotypic divergence is largely driven by gene regulatory variation, comparing genome 3D contacts across species can further understanding of the molecular basis of species differences. However, while experimental data on genome 3D contacts in humans are increasingly abundant, only a handful of 3D genome contact maps exist for other species. Here, we demonstrate that human experimental data can be used to close this data gap. We apply a machine learning model that predicts 3D genome contacts from DNA sequence to the genomes from 56 bonobos and chimpanzees and identify species-specific patterns of genome folding. We estimated 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows, of which ∼17% were substantially divergent in predicted genome contacts. Bonobos and chimpanzees diverged at 89 windows, overlapping genes associated with multiple traits implicated in Pan phenotypic divergence. We discovered 51 bonobo-specific variants that individually produce the observed bonobo contact pattern in bonobo-chimpanzee divergent windows. Our results demonstrate that machine learning methods can leverage human data to fill in data gaps across species, offering the first look at population-level 3D genome variation in nonhuman primates. We also identify loci where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microbiome Geographic Population Structure (mGPS) Detects Fine-Scale Geography. 微生物组地理种群结构 (mGPS) 可检测微观地理。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae209
Yali Zhang, Leo McCarthy, Emil Ruff, Eran Elhaik

Over the past decade, sequencing data generated by large microbiome projects showed that taxa exhibit patchy geographical distribution, raising questions about the geospatial dynamics that shape natural microbiomes and the spread of antimicrobial resistance genes. Answering these questions requires distinguishing between local and nonlocal microorganisms and identifying the source sites for the latter. Predicting the source sites and migration routes of microbiota has been envisioned for decades but was hampered by the lack of data, tools, and understanding of the processes governing biodiversity. State-of-the-art biogeographical tools suffer from low resolution and cannot predict biogeographical patterns at a scale relevant to ecological, medical, or epidemiological applications. Analyzing urban, soil, and marine microorganisms, we found that some taxa exhibit regional-specific composition and abundance, suggesting they can be used as biogeographical biomarkers. We developed the microbiome geographic population structure, a machine learning-based tool that utilizes microbial relative sequence abundances to yield a fine-scale source site for microorganisms. Microbiome geographic population structure predicted the source city for 92% of the samples and the within-city source for 82% of the samples, though they were often only a few hundred meters apart. Microbiome geographic population structure also predicted soil and marine sampling sites for 86% and 74% of the samples, respectively. We demonstrated that microbiome geographic population structure differentiated local from nonlocal microorganisms and used it to trace the global spread of antimicrobial resistance genes. Microbiome geographic population structure's ability to localize samples to their water body, country, city, and transit stations opens new possibilities in tracing microbiomes and has applications in forensics, medicine, and epidemiology.

在过去的十年中,大型微生物组项目产生的测序数据显示,类群呈现出零星的地理分布,从而引发了有关塑造自然微生物组和抗菌药耐药性(AMR)基因传播的地理空间动态的问题。要回答这些问题,需要区分本地和非本地微生物,并确定后者的来源地。预测微生物群的来源地和迁移路线的设想已提出数十年,但由于缺乏数据、工具和对生物多样性过程的了解而受阻。最先进的生物地理学工具分辨率低,无法预测与生态、医学或流行病学应用相关的生物地理学模式。通过分析城市、土壤和海洋微生物,我们发现一些类群表现出特定区域的组成和丰度,这表明它们可以用作生物地理生物标记。我们开发了微生物组地理种群结构(mGPS),这是一种基于机器学习的工具,它利用微生物相对序列丰度来得出微生物的细粒度来源地。mGPS 预测了 92% 样品的来源城市和 82% 样品的城市内来源地,尽管它们之间往往只有几百米的距离。我们证明了 mGPS 能够区分本地和非本地微生物,并利用它追踪 AMR 基因的全球传播。mGPS 能够将样本定位到水体、国家、城市和中转站,这为追踪微生物组提供了新的可能性,并可应用于法医、医学和流行病学。
{"title":"Microbiome Geographic Population Structure (mGPS) Detects Fine-Scale Geography.","authors":"Yali Zhang, Leo McCarthy, Emil Ruff, Eran Elhaik","doi":"10.1093/gbe/evae209","DOIUrl":"10.1093/gbe/evae209","url":null,"abstract":"<p><p>Over the past decade, sequencing data generated by large microbiome projects showed that taxa exhibit patchy geographical distribution, raising questions about the geospatial dynamics that shape natural microbiomes and the spread of antimicrobial resistance genes. Answering these questions requires distinguishing between local and nonlocal microorganisms and identifying the source sites for the latter. Predicting the source sites and migration routes of microbiota has been envisioned for decades but was hampered by the lack of data, tools, and understanding of the processes governing biodiversity. State-of-the-art biogeographical tools suffer from low resolution and cannot predict biogeographical patterns at a scale relevant to ecological, medical, or epidemiological applications. Analyzing urban, soil, and marine microorganisms, we found that some taxa exhibit regional-specific composition and abundance, suggesting they can be used as biogeographical biomarkers. We developed the microbiome geographic population structure, a machine learning-based tool that utilizes microbial relative sequence abundances to yield a fine-scale source site for microorganisms. Microbiome geographic population structure predicted the source city for 92% of the samples and the within-city source for 82% of the samples, though they were often only a few hundred meters apart. Microbiome geographic population structure also predicted soil and marine sampling sites for 86% and 74% of the samples, respectively. We demonstrated that microbiome geographic population structure differentiated local from nonlocal microorganisms and used it to trace the global spread of antimicrobial resistance genes. Microbiome geographic population structure's ability to localize samples to their water body, country, city, and transit stations opens new possibilities in tracing microbiomes and has applications in forensics, medicine, and epidemiology.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142380633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expansion of MHC-IIB Has Constrained the Evolution of MHC-IIA in Passerines. MHC-IIB 的扩展限制了鸟类中 MHC-IIA 的进化。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae236
Iris Liesbeth Ruesink-Bueno, Anna Drews, Emily Amelia O'Connor, Helena Westerdahl

The major histocompatibility complex (MHC) is central in adaptive immunity, with the highly polymorphic MHC genes encoding antigen-presenting molecules. Two MHC class II (MHC-II) loci, DA1 and DA2, predate the radiation of extant birds and persist throughout much of the avian phylogeny. Within each locus, the MHC-II molecules are encoded by A-genes (DAA) and B-genes (DAB), which are arranged in A-B dyads. However, in passerines (order Passeriformes), the DA2 locus has been lost, and the ancestral A-B dyad at the DA1 locus has been replaced by a putatively single A-gene (DAA1) and an array of highly polymorphic B-genes (DAB1). In this study, we genotyped the DAA1 gene of 15 passerine species and confirmed that passerines possess just one copy of DAA1. We then compared selection patterns in DAA1 between passerines and nonpasserines and found that exon 2, which encodes the antigen-presenting domain, has been subject to weaker positive selection and stronger negative selection in passerines compared with nonpasserines. Additional comparisons showed that the patterns of selection in the passerine DAA1 gene are unlikely to be related to the loss of the DA2 locus. Instead, our findings suggest that the expansion of DAB1 (MHC-IIB) has imposed an evolutionary constraint on the passerine DAA1 (MHC-IIA) gene. We speculate that this constraint may be the result of each DAA1 chain forming heterodimers with many different DAB1 chains.

主要组织相容性复合体(MHC)是适应性免疫的核心,高度多态的 MHC 基因编码抗原递呈分子。两个 MHC II 类(MHC-II)基因座 DA1 和 DA2 早在现生鸟类出现之前就已存在,并在鸟类系统发育的大部分过程中持续存在。在每个基因座内,MHC-II 分子分别由 A- (DAA)和 B- (DAB)基因编码,这两个基因以 A-B 双向排列。然而,在雀形目(passeriformes)中,DA2基因位点已经消失,DA1基因位点上的祖先A-B二联基因被一个可能是单一的A基因(DAA1)和一系列高度多态的B基因(DAB1)所取代。在这项研究中,我们对 15 种被鸟类的 DAA1 基因进行了基因分型,证实被鸟类只拥有一个 DAA1 拷贝。然后,我们比较了松鸡和非松鸡 DAA1 基因的选择模式,发现与非松鸡相比,编码抗原递呈结构域的第 2 外显子受到的正选择较弱,而负选择较强。进一步的比较表明,松鸡 DAA1 基因的选择模式不太可能与 DA2 基因座的缺失有关。相反,我们的研究结果表明,DAB1(MHC-IIB)基因的扩展对山雀 DAA1(MHC-IIA)基因施加了进化限制。我们推测,这种限制可能是每条 DAA1 链必须与许多不同的 DAB1 链形成异二聚体的结果。
{"title":"Expansion of MHC-IIB Has Constrained the Evolution of MHC-IIA in Passerines.","authors":"Iris Liesbeth Ruesink-Bueno, Anna Drews, Emily Amelia O'Connor, Helena Westerdahl","doi":"10.1093/gbe/evae236","DOIUrl":"10.1093/gbe/evae236","url":null,"abstract":"<p><p>The major histocompatibility complex (MHC) is central in adaptive immunity, with the highly polymorphic MHC genes encoding antigen-presenting molecules. Two MHC class II (MHC-II) loci, DA1 and DA2, predate the radiation of extant birds and persist throughout much of the avian phylogeny. Within each locus, the MHC-II molecules are encoded by A-genes (DAA) and B-genes (DAB), which are arranged in A-B dyads. However, in passerines (order Passeriformes), the DA2 locus has been lost, and the ancestral A-B dyad at the DA1 locus has been replaced by a putatively single A-gene (DAA1) and an array of highly polymorphic B-genes (DAB1). In this study, we genotyped the DAA1 gene of 15 passerine species and confirmed that passerines possess just one copy of DAA1. We then compared selection patterns in DAA1 between passerines and nonpasserines and found that exon 2, which encodes the antigen-presenting domain, has been subject to weaker positive selection and stronger negative selection in passerines compared with nonpasserines. Additional comparisons showed that the patterns of selection in the passerine DAA1 gene are unlikely to be related to the loss of the DA2 locus. Instead, our findings suggest that the expansion of DAB1 (MHC-IIB) has imposed an evolutionary constraint on the passerine DAA1 (MHC-IIA) gene. We speculate that this constraint may be the result of each DAA1 chain forming heterodimers with many different DAB1 chains.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cis-regulatory Variation in Relation to Sex and Sexual Dimorphism in Drosophila melanogaster. 黑腹果蝇性别和两性异形的顺式调控变异。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae234
Prashastha Mishra, Tania S Barrera, Karl Grieshop, Aneil F Agrawal

Much of sexual dimorphism is likely due to sex-biased gene expression, which results from differential regulation of a genome that is largely shared between males and females. Here, we use allele-specific expression to explore cis-regulatory variation in Drosophila melanogaster in relation to sex. We develop a Bayesian framework to infer the transcriptome-wide joint distribution of cis-regulatory effects across the sexes. We also examine patterns of cis-regulatory variation with respect to two other levels of variation in sexual dimorphism: (i) across genes that vary in their degree of sex-biased expression and (ii) among tissues that vary in their degree of dimorphism (e.g. relatively low dimorphism in heads vs. high dimorphism in gonads). We uncover evidence of widespread cis-regulatory variation in all tissues examined, with female-biased genes being especially enriched for this variation. A sizeable proportion of cis-regulatory variation is inferred to have sex-specific effects, with sex-dependent cis effects being much more frequent in gonads than in heads. Finally, we find some genes where 1 allele contributes to more than 50% of a gene's expression in heterozygous males but <50% of its expression in heterozygous females. Such variants could provide a mechanism for sex-specific dominance reversals, a phenomenon important for sexually antagonistic balancing selection. However, tissue differences in allelic imbalance are approximately as frequent as sex differences, perhaps suggesting that sexual conflict may not be particularly unique in shaping patterns of expression variation.

许多两性二态现象可能是由于性别偏倚的基因表达,这是由于基因组在很大程度上在男性和女性之间共享的差异调节造成的。在这里,我们使用等位基因特异性表达来探索黑腹果蝇与性别相关的顺式调控变异。我们开发了一个贝叶斯框架来推断顺式调节效应在跨性别的转录组范围内的联合分布。我们还研究了关于两性二态性的两个其他水平变异的顺式调控变异模式:(i)在性别偏性表达程度不同的基因之间,(ii)在二态性程度不同的组织之间(例如,头部二态性相对较低,性腺二态性较高)。我们发现在所检查的所有组织中普遍存在顺式调控变异的证据,其中女性偏倚基因尤其富集于这种变异。据推测,相当大比例的顺式调节变异具有性别特异性效应,性别依赖性顺式效应在性腺中比在头部中更为常见。最后,我们发现一些基因在杂合雄性中,一个等位基因对一个基因表达的贡献超过50%
{"title":"Cis-regulatory Variation in Relation to Sex and Sexual Dimorphism in Drosophila melanogaster.","authors":"Prashastha Mishra, Tania S Barrera, Karl Grieshop, Aneil F Agrawal","doi":"10.1093/gbe/evae234","DOIUrl":"10.1093/gbe/evae234","url":null,"abstract":"<p><p>Much of sexual dimorphism is likely due to sex-biased gene expression, which results from differential regulation of a genome that is largely shared between males and females. Here, we use allele-specific expression to explore cis-regulatory variation in Drosophila melanogaster in relation to sex. We develop a Bayesian framework to infer the transcriptome-wide joint distribution of cis-regulatory effects across the sexes. We also examine patterns of cis-regulatory variation with respect to two other levels of variation in sexual dimorphism: (i) across genes that vary in their degree of sex-biased expression and (ii) among tissues that vary in their degree of dimorphism (e.g. relatively low dimorphism in heads vs. high dimorphism in gonads). We uncover evidence of widespread cis-regulatory variation in all tissues examined, with female-biased genes being especially enriched for this variation. A sizeable proportion of cis-regulatory variation is inferred to have sex-specific effects, with sex-dependent cis effects being much more frequent in gonads than in heads. Finally, we find some genes where 1 allele contributes to more than 50% of a gene's expression in heterozygous males but <50% of its expression in heterozygous females. Such variants could provide a mechanism for sex-specific dominance reversals, a phenomenon important for sexually antagonistic balancing selection. However, tissue differences in allelic imbalance are approximately as frequent as sex differences, perhaps suggesting that sexual conflict may not be particularly unique in shaping patterns of expression variation.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":"16 11","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142754971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chromosome-Level Genome Assembly of the Australian Rainforest Tree Rhodamnia argentea (Malletwood). 澳大利亚热带雨林树种红木(Rhodamnia argentea)的染色体级基因组组装。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae238
Stephanie H Chen, Ashley Jones, Patricia Lu-Irving, Jia-Yee S Yap, Marlien van der Merwe, Jason G Bragg, Richard J Edwards

Myrtaceae are a large family of woody plants, including hundreds that are currently under threat from the global spread of a fungal pathogen, Austropuccinia psidii (G. Winter) Beenken, which causes myrtle rust. A reference genome for the Australian native rainforest tree Rhodamnia argentea Benth. (malletwood) was assembled from Oxford Nanopore Technologies long-reads, 10x Genomics Chromium linked-reads, and Hi-C data (N50 = 32.3 Mb and BUSCO completeness 98.0%) with 99.0% of the 347 Mb assembly anchored to 11 chromosomes (2n = 22). The R. argentea genome will inform conservation efforts for Myrtaceae species threatened by myrtle rust, against which it shows variable resistance. We observed contamination in the sequencing data, and further investigation revealed an arthropod source. This study emphasizes the importance of checking sequencing data for contamination, especially when working with nonmodel organisms. It also enhances our understanding of a tree that faces conservation challenges, contributing to broader biodiversity initiatives.

桃金娘科(Myrtaceae)是一个木本植物大家族,其中有数百种植物目前正受到一种真菌病原体 Austropuccinia psidii (G.Winter) Beenken 在全球传播的威胁,这种病原体会导致桃金娘锈病。澳大利亚原生雨林树 Rhodamnia argentea Benth.(malletwood) 的参考基因组,该基因组由牛津纳米孔技术公司(ONT)的长读数、10 倍基因组学 Chromium 链接读数和 Hi-C 数据(N50 = 32.3 Mbp,BUSCO 完整性 98.0%)组装而成,347 Mbp 组装的 99.0% 固定在 11 条染色体上(2n = 22)。R. argentea 的基因组将为受到桃金娘锈病威胁的桃金娘科物种的保护工作提供信息,桃金娘锈病对该物种表现出不同的抗性。我们在测序数据中发现了污染,进一步调查发现了节肢动物的来源。这项研究强调了检查测序数据污染的重要性,尤其是在研究非模式生物时。这项研究还增进了我们对一种面临保护挑战的树种的了解,为更广泛的生物多样性计划做出了贡献。
{"title":"Chromosome-Level Genome Assembly of the Australian Rainforest Tree Rhodamnia argentea (Malletwood).","authors":"Stephanie H Chen, Ashley Jones, Patricia Lu-Irving, Jia-Yee S Yap, Marlien van der Merwe, Jason G Bragg, Richard J Edwards","doi":"10.1093/gbe/evae238","DOIUrl":"10.1093/gbe/evae238","url":null,"abstract":"<p><p>Myrtaceae are a large family of woody plants, including hundreds that are currently under threat from the global spread of a fungal pathogen, Austropuccinia psidii (G. Winter) Beenken, which causes myrtle rust. A reference genome for the Australian native rainforest tree Rhodamnia argentea Benth. (malletwood) was assembled from Oxford Nanopore Technologies long-reads, 10x Genomics Chromium linked-reads, and Hi-C data (N50 = 32.3 Mb and BUSCO completeness 98.0%) with 99.0% of the 347 Mb assembly anchored to 11 chromosomes (2n = 22). The R. argentea genome will inform conservation efforts for Myrtaceae species threatened by myrtle rust, against which it shows variable resistance. We observed contamination in the sequencing data, and further investigation revealed an arthropod source. This study emphasizes the importance of checking sequencing data for contamination, especially when working with nonmodel organisms. It also enhances our understanding of a tree that faces conservation challenges, contributing to broader biodiversity initiatives.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Diversity of Heat-Induced Transcription of Retrotransposons in Arabidopsis thaliana. 拟南芥热诱导转录转座子的自然多样性
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae242
Wenbo Xu, Michael Thieme, Anne C Roulin

Transposable elements (TEs) are major components of plant genomes, profoundly impacting the fitness of their hosts. However, technical bottlenecks have long hindered our mechanistic understanding of TEs. Using RNA-Seq and long-read sequencing with Oxford Nanopore Technologies' (ONT) direct cDNA sequencing, we analyzed the heat-induced transcription of TEs in three natural accessions of Arabidopsis thaliana (Cvi-0, Col-0, and Ler-1). In addition to the well-studied ONSEN retrotransposon family, we confirmed Copia-35 as a second heat-responsive retrotransposon family with particularly high activity in the relict accession Cvi-0. Our analysis revealed distinct expression patterns of individual TE copies and suggest different mechanisms regulating the GAG protein production in the ONSEN versus Copia-35 families. In addition, analogously to ONSEN, Copia-35 activation led to the upregulation of flanking genes such as APUM9 and potentially to the quantitative modulation of flowering time. ONT data allowed us to test the extent to which read-through formation is important in the regulation of adjacent genes. Unexpectedly, our results indicate that for both families, the upregulation of flanking genes is not predominantly directly initiated by transcription from their 3' long terminal repeats. These findings highlight the intraspecific expressional diversity linked to retrotransposon activation under stress.

可转座元件(Transposable elements,TEs)是植物基因组的主要组成部分,对宿主的适应性有深远影响。然而,技术瓶颈长期以来一直阻碍着我们对可转座元件的机理认识。我们利用牛津纳米孔技术公司(ONT)的直接 cDNA 测序技术进行了 RNA-Seq 和长序列测序,分析了拟南芥三个天然品种(Cvi-0、Col-0 和 Ler-1)中热诱导的 TEs 转录。除了研究较多的ONSEN逆转录质子家族外,我们还证实了Copia-35是第二个热响应逆转录质子家族,在孑遗种Cvi-0中具有特别高的活性。我们的分析揭示了单个TE拷贝的不同表达模式,并表明在ONSEN和Copia-35家族中存在不同的GAG蛋白生产调控机制。此外,与 ONSEN 类似,Copia-35 的激活也导致了侧翼基因(如 AMUP9)的上调,并可能导致花期的定量调节。通过 ONT 数据,我们可以检验通读形成在多大程度上对相邻基因的调控起着重要作用。意想不到的是,我们的结果表明,对于这两个家族来说,侧翼基因的上调并不主要是由其 3' LTR 的转录直接启动的。这些发现凸显了在胁迫下与逆转录质子激活相关的种内表达多样性。
{"title":"Natural Diversity of Heat-Induced Transcription of Retrotransposons in Arabidopsis thaliana.","authors":"Wenbo Xu, Michael Thieme, Anne C Roulin","doi":"10.1093/gbe/evae242","DOIUrl":"10.1093/gbe/evae242","url":null,"abstract":"<p><p>Transposable elements (TEs) are major components of plant genomes, profoundly impacting the fitness of their hosts. However, technical bottlenecks have long hindered our mechanistic understanding of TEs. Using RNA-Seq and long-read sequencing with Oxford Nanopore Technologies' (ONT) direct cDNA sequencing, we analyzed the heat-induced transcription of TEs in three natural accessions of Arabidopsis thaliana (Cvi-0, Col-0, and Ler-1). In addition to the well-studied ONSEN retrotransposon family, we confirmed Copia-35 as a second heat-responsive retrotransposon family with particularly high activity in the relict accession Cvi-0. Our analysis revealed distinct expression patterns of individual TE copies and suggest different mechanisms regulating the GAG protein production in the ONSEN versus Copia-35 families. In addition, analogously to ONSEN, Copia-35 activation led to the upregulation of flanking genes such as APUM9 and potentially to the quantitative modulation of flowering time. ONT data allowed us to test the extent to which read-through formation is important in the regulation of adjacent genes. Unexpectedly, our results indicate that for both families, the upregulation of flanking genes is not predominantly directly initiated by transcription from their 3' long terminal repeats. These findings highlight the intraspecific expressional diversity linked to retrotransposon activation under stress.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11580521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142618717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Diversity and Distribution of Nuclear Matrix Constituent Protein Class Nuclear Lamina Proteins in Streptophytic Algae. 链藻中 NMCP 级核薄层蛋白的结构多样性和分布。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae244
Brendan S Kosztyo, Eric J Richards

Nuclear matrix constituent proteins in plants function like animal lamins, providing the structural foundation of the nuclear lamina and regulating nuclear organization and morphology. Although they are well characterized in angiosperms, the presence and structure of nuclear matrix constituent proteins in more distantly related species, such as streptophytic algae, are relatively unknown. The rapid evolution of nuclear matrix constituent proteins throughout the plant lineage has caused a divergence in protein sequence that makes similarity-based searches less effective. Structural features are more likely to be conserved compared to primary amino acid sequence; therefore, we developed a filtration protocol to search for diverged nuclear matrix constituent proteins based on four physical characteristics: intrinsically disordered content, isoelectric point, number of amino acids, and the presence of a central coiled-coil domain. By setting parameters to recognize the properties of bona fide nuclear matrix constituent protein proteins in angiosperms, we filtered eight complete proteomes from streptophytic algae species and identified strong nuclear matrix constituent protein candidates in six taxa in the Classes Zygnematophyceae, Charophyceae, and Klebsormidiophyceae. Through analysis of these proteins, we observed structural variance in domain size between nuclear matrix constituent proteins in algae and land plants, as well as a single block of amino acid conservation. Our analysis indicates that nuclear matrix constituent proteins are absent in the Mesostigmatophyceae. The presence versus absence of nuclear matrix constituent protein proteins does not correlate with the distribution of different forms of mitosis (e.g. closed/semi-closed/open) but does correspond to the transition from unicellularity to multicellularity in the streptophytic algae, suggesting that a nuclear matrix constituent protein-based nucleoskeleton plays important roles in supporting cell-to-cell interactions.

植物中的核基质成分蛋白(NMCPs)与动物的片层蛋白功能类似,为核薄层提供结构基础,并调节核组织和形态。虽然被子植物中的 NMCPs 已被很好地描述,但在链格藻等亲缘关系较远的物种中,NMCPs 的存在和结构却相对未知。NMCPs 在整个植物谱系中的快速进化造成了蛋白质序列的差异,这使得基于相似性的搜索变得不那么有效。与主要氨基酸序列相比,结构特征更有可能得到保守;因此,我们开发了一种过滤方案,根据以下四个物理特征来搜索分化的 NMCPs:内在无序含量、等电点、氨基酸数量以及是否存在中央盘绕结构域。通过设置参数来识别被子植物中真正的 NMCP 蛋白的特性,我们筛选了来自链格藻的 8 个完整蛋白质组,并在 Zygnematophyceae、Charophyceae 和 Klebsormidophyceae 类中的 6 个类群中发现了强大的 NMCP 候选蛋白。通过对这些蛋白质的分析,我们观察到藻类和陆地植物中的 NMCPs 在结构域大小上存在差异,并且存在单一的氨基酸保守区块。我们的分析表明,中柱叶藻中不存在 NMCPs。NMCP蛋白的存在与否与有丝分裂的不同形式(如闭合/半闭合/开放)的分布并不相关,但确实与链格藻从单细胞性向多细胞性的过渡相对应,这表明基于NMCP的核骨架在支持细胞间相互作用方面发挥着重要作用。
{"title":"Structural Diversity and Distribution of Nuclear Matrix Constituent Protein Class Nuclear Lamina Proteins in Streptophytic Algae.","authors":"Brendan S Kosztyo, Eric J Richards","doi":"10.1093/gbe/evae244","DOIUrl":"10.1093/gbe/evae244","url":null,"abstract":"<p><p>Nuclear matrix constituent proteins in plants function like animal lamins, providing the structural foundation of the nuclear lamina and regulating nuclear organization and morphology. Although they are well characterized in angiosperms, the presence and structure of nuclear matrix constituent proteins in more distantly related species, such as streptophytic algae, are relatively unknown. The rapid evolution of nuclear matrix constituent proteins throughout the plant lineage has caused a divergence in protein sequence that makes similarity-based searches less effective. Structural features are more likely to be conserved compared to primary amino acid sequence; therefore, we developed a filtration protocol to search for diverged nuclear matrix constituent proteins based on four physical characteristics: intrinsically disordered content, isoelectric point, number of amino acids, and the presence of a central coiled-coil domain. By setting parameters to recognize the properties of bona fide nuclear matrix constituent protein proteins in angiosperms, we filtered eight complete proteomes from streptophytic algae species and identified strong nuclear matrix constituent protein candidates in six taxa in the Classes Zygnematophyceae, Charophyceae, and Klebsormidiophyceae. Through analysis of these proteins, we observed structural variance in domain size between nuclear matrix constituent proteins in algae and land plants, as well as a single block of amino acid conservation. Our analysis indicates that nuclear matrix constituent proteins are absent in the Mesostigmatophyceae. The presence versus absence of nuclear matrix constituent protein proteins does not correlate with the distribution of different forms of mitosis (e.g. closed/semi-closed/open) but does correspond to the transition from unicellularity to multicellularity in the streptophytic algae, suggesting that a nuclear matrix constituent protein-based nucleoskeleton plays important roles in supporting cell-to-cell interactions.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142618719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies. MATEdb2,动物生命之树上高质量的元虫蛋白质组收集,以加快系统发生学研究。
IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/gbe/evae235
Gemma I Martínez-Redondo, Carlos Vargas-Chávez, Klara Eleftheriadi, Lisandra Benítez-Álvarez, Marçal Vázquez-Valls, Rosa Fernández

Recent advances in high-throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (i) we include data from all animal phyla where public data are available, and (ii) we provide gene annotations extracted from the original GFF genome files using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1,000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.

过去几十年来,高通量测序技术的最新进展使动物(后生动物)基因组数据的数量呈指数级增长,几乎每天都有高质量的染色体级基因组发表。然而,由于基因组测序成本高昂、组装复杂度高以及缺乏标准化的基因组注释方案,生成一个新的基因组并非易事。基因组文件的注释和发布缺乏共识,不仅会耽误研究人员根据自己的目的重新格式化文件的时间,还会降低进化研究的基因库质量,从而阻碍研究工作。因此,使用相同管道获得的转录组作为物种遗传内容的替代物仍然是一种宝贵的资源,它比基因组更容易获得、更便宜、更具可比性。在之前的一项研究中,我们介绍了 "元动物转录组组装数据库"(MATEdb),这是一个高质量的转录组和基因组数据的存储库,涵盖了两个最多样化的动物门类--节肢动物门和软体动物门。在此,我们介绍 MATEdb 的最新版本(MATEdb2),它克服了我们数据库以前的一些局限性:(i) 我们包含了所有可获得公开数据的动物门类的数据,(ii) 我们提供了使用相同管道从原始 GFF 基因组文件中提取的基因注释。我们总共提供了近 1000 种动物的高质量转录组或基因组数据推断出的蛋白质组,包括最长的同工酶、所有同工酶、基于序列同源性和蛋白质语言模型的功能注释,以及序列的嵌入表示。我们相信新版 MATEdb 将加速动物系统发生组学的研究,同时节省成千上万小时的计算工作,以实现开放、绿色和协作科学。
{"title":"MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.","authors":"Gemma I Martínez-Redondo, Carlos Vargas-Chávez, Klara Eleftheriadi, Lisandra Benítez-Álvarez, Marçal Vázquez-Valls, Rosa Fernández","doi":"10.1093/gbe/evae235","DOIUrl":"10.1093/gbe/evae235","url":null,"abstract":"<p><p>Recent advances in high-throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (i) we include data from all animal phyla where public data are available, and (ii) we provide gene annotations extracted from the original GFF genome files using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1,000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":"16 11","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142618734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome Biology and Evolution
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1