豆科和禾本科植物家族蛋白质结构域的增益和损失。

IF 1.7 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Evolutionary Bioinformatics Pub Date : 2020-07-09 eCollection Date: 2020-01-01 DOI:10.1177/1176934320939943
Akshay Yadav, David Fernández-Baca, Steven B Cannon
{"title":"豆科和禾本科植物家族蛋白质结构域的增益和损失。","authors":"Akshay Yadav, David Fernández-Baca, Steven B Cannon","doi":"10.1177/1176934320939943","DOIUrl":null,"url":null,"abstract":"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939943"},"PeriodicalIF":1.7000,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939943","citationCount":"2","resultStr":"{\"title\":\"Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.\",\"authors\":\"Akshay Yadav, David Fernández-Baca, Steven B Cannon\",\"doi\":\"10.1177/1176934320939943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.\",\"PeriodicalId\":50472,\"journal\":{\"name\":\"Evolutionary Bioinformatics\",\"volume\":\"16 \",\"pages\":\"1176934320939943\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2020-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1177/1176934320939943\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1177/1176934320939943\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1177/1176934320939943","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 2

摘要

蛋白质结构域可以看作是能够独立折叠并执行特定功能的蛋白质序列片段。除了氨基酸水平的变化外,蛋白质序列还可以通过结构域改组事件(如结构域插入、删除或重复)进化。蛋白质结构域的进化可以通过跟踪一组已知系统发育关系的物种的结构域变化来研究。在这里,我们通过将域定义为“特征”或“描述符”,并将物种(目标+外群)视为数据矩阵中的实例或数据点来进行这样的分析。然后,我们寻找目标物种和外群物种之间显著不同的特征(域)。我们研究了豆科(Fabaceae)和禾本科(Poaceae)这两个大而不同的植物类群的域变化。我们评估了4种类型的领域特征矩阵:领域内容、领域重复、领域丰富度和领域多功能性。这四种类型的结构域特征矩阵试图捕捉蛋白质序列可能进化的结构域变化的不同方面,即通过结构域的获得或失去,序列中结构域拷贝数的增加或减少,结构域的扩展或收缩,或通过相邻结构域伙伴数量的变化。利用特征选择技术和统计检验对所有特征矩阵进行分析,筛选出豆科植物和禾本科植物中具有显著不同特征值的蛋白质结构域。我们报告了从所有特征矩阵的分析中选择的顶级域的生物学功能。此外,我们还对所有4个特征矩阵中选择的所有域进行了以域为中心的基因本体(dcGO)富集分析,以研究与豆科植物和禾本科植物中显著进化的域相关的基因本体术语。结构域含量分析显示,Fanconi贫血(FA)通路的蛋白结构域显著缺失,该通路负责DNA链间交联的修复。在豆类中发现的结构域丰度分析显示,固氮所需的抗氧化剂谷胱甘肽合成酶增加,而黄嘌呤氧化酶减少,这一现象已被先前的研究证实。在禾草中,丰度分析显示与基因沉默相关的结构域增加,这可能是由于多倍体或对病毒感染的反应增强所致。我们提供了一个docker容器,可用于在任何用户定义的物种集上执行此分析工作流,可在https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.
Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Evolutionary Bioinformatics
Evolutionary Bioinformatics 生物-进化生物学
CiteScore
4.20
自引率
0.00%
发文量
25
审稿时长
12 months
期刊介绍: Evolutionary Bioinformatics is an open access, peer reviewed international journal focusing on evolutionary bioinformatics. The journal aims to support understanding of organismal form and function through use of molecular, genetic, genomic and proteomic data by giving due consideration to its evolutionary context.
期刊最新文献
In silico Characterization of a Hypothetical Protein (PBJ89160.1) from Neisseria meningitidis Exhibits a New Insight on Nutritional Virulence and Molecular Docking to Uncover a Therapeutic Target. Comparative Phylogenetic Analysis and Protein Prediction Reveal the Taxonomy and Diverse Distribution of Virulence Factors in Foodborne Clostridium Strains. An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix. Comprehensive Profiling of Transcriptome and m6A Epitranscriptome Uncovers the Neurotoxic Effects of Yunaconitine on HT22 Cells. Label Transfer for Drug Disease Association in Three Meta-Paths
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1