豆科和禾本科植物家族蛋白质结构域的增益和损失。

IF 1.5 4区生物学 Q4 EVOLUTIONARY BIOLOGY Evolutionary Bioinformatics Pub Date : 2020-07-09 eCollection Date: 2020-01-01 DOI:10.1177/1176934320939943

Akshay Yadav, David Fernández-Baca, Steven B Cannon

{"title":"豆科和禾本科植物家族蛋白质结构域的增益和损失。","authors":"Akshay Yadav, David Fernández-Baca, Steven B Cannon","doi":"10.1177/1176934320939943","DOIUrl":null,"url":null,"abstract":"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939943"},"PeriodicalIF":1.5000,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939943","citationCount":"2","resultStr":"{\"title\":\"Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.\",\"authors\":\"Akshay Yadav, David Fernández-Baca, Steven B Cannon\",\"doi\":\"10.1177/1176934320939943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.\",\"PeriodicalId\":50472,\"journal\":{\"name\":\"Evolutionary Bioinformatics\",\"volume\":\"16 \",\"pages\":\"1176934320939943\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2020-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1177/1176934320939943\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1177/1176934320939943\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1177/1176934320939943","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}

引用次数: 2

摘要

蛋白质结构域可以看作是能够独立折叠并执行特定功能的蛋白质序列片段。除了氨基酸水平的变化外，蛋白质序列还可以通过结构域改组事件(如结构域插入、删除或重复)进化。蛋白质结构域的进化可以通过跟踪一组已知系统发育关系的物种的结构域变化来研究。在这里，我们通过将域定义为“特征”或“描述符”，并将物种(目标+外群)视为数据矩阵中的实例或数据点来进行这样的分析。然后，我们寻找目标物种和外群物种之间显著不同的特征(域)。我们研究了豆科(Fabaceae)和禾本科(Poaceae)这两个大而不同的植物类群的域变化。我们评估了4种类型的领域特征矩阵:领域内容、领域重复、领域丰富度和领域多功能性。这四种类型的结构域特征矩阵试图捕捉蛋白质序列可能进化的结构域变化的不同方面，即通过结构域的获得或失去，序列中结构域拷贝数的增加或减少，结构域的扩展或收缩，或通过相邻结构域伙伴数量的变化。利用特征选择技术和统计检验对所有特征矩阵进行分析，筛选出豆科植物和禾本科植物中具有显著不同特征值的蛋白质结构域。我们报告了从所有特征矩阵的分析中选择的顶级域的生物学功能。此外，我们还对所有4个特征矩阵中选择的所有域进行了以域为中心的基因本体(dcGO)富集分析，以研究与豆科植物和禾本科植物中显著进化的域相关的基因本体术语。结构域含量分析显示，Fanconi贫血(FA)通路的蛋白结构域显著缺失，该通路负责DNA链间交联的修复。在豆类中发现的结构域丰度分析显示，固氮所需的抗氧化剂谷胱甘肽合成酶增加，而黄嘌呤氧化酶减少，这一现象已被先前的研究证实。在禾草中，丰度分析显示与基因沉默相关的结构域增加，这可能是由于多倍体或对病毒感染的反应增强所致。我们提供了一个docker容器，可用于在任何用户定义的物种集上执行此分析工作流，可在https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.

Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Evolutionary Bioinformatics 生物-进化生物学

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

12 months

期刊介绍： Evolutionary Bioinformatics is an open access, peer reviewed international journal focusing on evolutionary bioinformatics. The journal aims to support understanding of organismal form and function through use of molecular, genetic, genomic and proteomic data by giving due consideration to its evolutionary context.