首页 > 最新文献

Evolutionary Bioinformatics最新文献

英文 中文
Genome-Wide Identification and Characterization of the SHI-Related Sequence Gene Family in Rice. 水稻shi相关序列基因家族的全基因组鉴定与特征分析。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-09-11 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320941495
Jun Yang, Peng Xu, Diqiu Yu

Rice (Oryza sativa) yield is correlated to various factors. Transcription regulators are important factors, such as the typical SHORT INTERNODES-related sequences (SRSs), which encode proteins with single zinc finger motifs. Nevertheless, knowledge regarding the evolutionary and functional characteristics of the SRS gene family members in rice is insufficient. Therefore, we performed a genome-wide screening and characterization of the OsSRS gene family in Oryza sativa japonica rice. We also examined the SRS proteins from 11 rice sub-species, consisting of 3 cultivars, 6 wild varieties, and 2 other genome types. SRS members from maize, sorghum, Brachypodium distachyon, and Arabidopsis were also investigated. All these SRS proteins exhibited species-specific characteristics, as well as monocot- and dicot-specific characteristics, as assessed by phylogenetic analysis, which was further validated by gene structure and motif analyses. Genome comparisons revealed that segmental duplications may have played significant roles in the recombination of the OsSRS gene family and their expression levels. The family was mainly subjected to purifying selective pressure. In addition, the expression data demonstrated the distinct responses of OsSRS genes to various abiotic stresses and hormonal treatments, indicating their functional divergence. Our study provides a good reference for elucidating the functions of SRS genes in rice.

水稻(Oryza sativa)产量与多种因素相关。转录调控因子是重要的因子,如典型的短internodes相关序列(SRSs),其编码具有单个锌指基序的蛋白质。然而,关于水稻SRS基因家族成员的进化和功能特征的知识是不足的。因此,我们对水稻OsSRS基因家族进行了全基因组筛选和鉴定。我们还检测了11个水稻亚种的SRS蛋白,其中包括3个栽培品种,6个野生品种和2个其他基因组类型。对玉米、高粱、长柄短茅和拟南芥的SRS成员也进行了研究。系统发育分析表明,所有SRS蛋白均具有物种特异性、单株特异性和双株特异性,基因结构和基序分析进一步证实了这一点。基因组比较表明,片段重复可能在OsSRS基因家族重组及其表达水平中发挥重要作用。家庭主要受到净化选择压力。此外,表达数据显示OsSRS基因对各种非生物胁迫和激素处理的反应不同,表明它们的功能分化。本研究为阐明水稻SRS基因的功能提供了良好的参考。
{"title":"Genome-Wide Identification and Characterization of the SHI-Related Sequence Gene Family in Rice.","authors":"Jun Yang,&nbsp;Peng Xu,&nbsp;Diqiu Yu","doi":"10.1177/1176934320941495","DOIUrl":"https://doi.org/10.1177/1176934320941495","url":null,"abstract":"<p><p>Rice (<i>Oryza sativa</i>) yield is correlated to various factors. Transcription regulators are important factors, such as the typical SHORT INTERNODES-related sequences (SRSs), which encode proteins with single zinc finger motifs. Nevertheless, knowledge regarding the evolutionary and functional characteristics of the <i>SRS</i> gene family members in rice is insufficient. Therefore, we performed a genome-wide screening and characterization of the <i>OsSRS</i> gene family in <i>Oryza sativa</i> japonica rice. We also examined the SRS proteins from 11 rice sub-species, consisting of 3 cultivars, 6 wild varieties, and 2 other genome types. SRS members from maize, sorghum, <i>Brachypodium distachyon</i>, and <i>Arabidopsis</i> were also investigated. All these SRS proteins exhibited species-specific characteristics, as well as monocot- and dicot-specific characteristics, as assessed by phylogenetic analysis, which was further validated by gene structure and motif analyses. Genome comparisons revealed that segmental duplications may have played significant roles in the recombination of the <i>OsSRS</i> gene family and their expression levels. The family was mainly subjected to purifying selective pressure. In addition, the expression data demonstrated the distinct responses of <i>OsSRS</i> genes to various abiotic stresses and hormonal treatments, indicating their functional divergence. Our study provides a good reference for elucidating the functions of <i>SRS</i> genes in rice.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320941495"},"PeriodicalIF":2.6,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320941495","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38408336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
ZMAT2 in Humans and Other Primates: A Highly Conserved and Understudied Gene. ZMAT2在人类和其他灵长类动物:一个高度保守和研究不足的基因。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-09-02 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320941500
Kabita Baral, Peter Rotwein

Recent advances in genetics present unique opportunities for enhancing our understanding of human physiology and disease predisposition through detailed analysis of gene structure, expression, and population variation via examination of data in publicly accessible genome and gene expression repositories. Yet, the vast majority of human genes remain understudied. Here, we show the scope of these genomic and genetic resources by evaluating ZMAT2, a member of a 5-gene family that through May 2020 had been the focus of only 4 peer-reviewed scientific publications. Using analysis of information extracted from public databases, we show that human ZMAT2 is a 6-exon gene and find that it exhibits minimal genetic variation in human populations and in disease states, including cancer. We further demonstrate that the gene and its encoded protein are highly conserved among nonhuman primates and define a cohort of ZMAT2 pseudogenes in the marmoset genome. Collectively, our investigations illustrate how complementary use of genomic, gene expression, and population genetic resources can lead to new insights about human and mammalian biology and evolution, and when coupled with data supporting key roles for ZMAT2 in keratinocyte differentiation and pre-RNA splicing argue that this gene is worthy of further study.

遗传学的最新进展提供了独特的机会,通过检查可公开访问的基因组和基因表达库中的数据,对基因结构、表达和群体变异进行详细分析,从而增强我们对人类生理学和疾病易感性的理解。然而,绝大多数人类基因仍未得到充分研究。在这里,我们通过评估ZMAT2来展示这些基因组和遗传资源的范围,ZMAT2是一个5基因家族的成员,到2020年5月,只有4篇同行评议的科学出版物关注了ZMAT2。通过分析从公共数据库中提取的信息,我们发现人类ZMAT2是一个6外显子基因,并发现它在人类群体和疾病状态(包括癌症)中表现出最小的遗传变异。我们进一步证明了该基因及其编码蛋白在非人灵长类动物中高度保守,并在狨猴基因组中定义了一组ZMAT2假基因。总的来说,我们的研究说明了基因组,基因表达和群体遗传资源的互补使用如何导致对人类和哺乳动物生物学和进化的新见解,并且当结合支持ZMAT2在角化细胞分化和前rna剪接中的关键作用的数据时,认为该基因值得进一步研究。
{"title":"<i>ZMAT2</i> in Humans and Other Primates: A Highly Conserved and Understudied Gene.","authors":"Kabita Baral,&nbsp;Peter Rotwein","doi":"10.1177/1176934320941500","DOIUrl":"https://doi.org/10.1177/1176934320941500","url":null,"abstract":"<p><p>Recent advances in genetics present unique opportunities for enhancing our understanding of human physiology and disease predisposition through detailed analysis of gene structure, expression, and population variation via examination of data in publicly accessible genome and gene expression repositories. Yet, the vast majority of human genes remain understudied. Here, we show the scope of these genomic and genetic resources by evaluating <i>ZMAT2</i>, a member of a 5-gene family that through May 2020 had been the focus of only 4 peer-reviewed scientific publications. Using analysis of information extracted from public databases, we show that human <i>ZMAT2</i> is a 6-exon gene and find that it exhibits minimal genetic variation in human populations and in disease states, including cancer. We further demonstrate that the gene and its encoded protein are highly conserved among nonhuman primates and define a cohort of <i>ZMAT2</i> pseudogenes in the marmoset genome. Collectively, our investigations illustrate how complementary use of genomic, gene expression, and population genetic resources can lead to new insights about human and mammalian biology and evolution, and when coupled with data supporting key roles for ZMAT2 in keratinocyte differentiation and pre-RNA splicing argue that this gene is worthy of further study.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320941500"},"PeriodicalIF":2.6,"publicationDate":"2020-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320941500","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38496158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Identification of Metastasis-Associated Genes in Triple-Negative Breast Cancer Using Weighted Gene Co-expression Network Analysis. 利用加权基因共表达网络分析鉴定三阴性乳腺癌转移相关基因。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-09-01 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320954868
Wenting Xie, Zhongshi Du, Yijie Chen, Naxiang Liu, Zhaoming Zhong, Youhong Shen, Lina Tang

Triple-negative breast cancer (TNBC) is the most aggressive and fatal sub-type of breast cancer. This study aimed to identify metastasis-associated genes that could serve as biomarkers for TNBC diagnosis and prognosis. RNA-seq data and clinical information on TNBC from the Cancer Genome Atlas were used to conduct analyses. Expression data were used to establish co-expression modules using average linkage hierarchical clustering. We used weighted gene co-expression network analysis to explore the associations between gene sets and clinical features and to identify metastasis-associated candidate biomarkers. The K-M plotter website was used to explore the association between the expression of candidate biomarkers and patient survival. In addition, receiver operating characteristic curve analysis was used to illustrate the diagnostic performance of candidate genes. The pale turquoise module was significantly associated with the occurrence of metastasis. In this module, 64 genes were identified, and its functional enrichment analysis revealed that they were mainly associated with transcriptional misregulation in cancer, microRNAs in cancer, and negative regulation of angiogenesis. Further, 4 genes, IGSF10, RUNX1T1, XIST, and TSHZ2, which were negatively associated with relapse-free survival and have seldom been reported before in TNBC, were selected. In addition, the mRNA expression levels of the 4 candidate genes were significantly lower in TNBC tumor tissues compared with healthy tissues. Based on the K-M plotter, these 4 genes were correlated with poor prognosis of TNBC. The area under the curve of IGSF10, RUNX1T1, TSHZ2, and XIST was 0.918, 0.957, 0.977, and 0.749. These findings provide new insight into TNBC metastasis. IGSF10, RUNX1T1, TSHZ2, and XIST could be used as candidate biomarkers for the diagnosis and prognosis of TNBC metastasis.

三阴性乳腺癌(TNBC)是最具侵袭性和致命性的乳腺癌亚型。本研究旨在鉴定可作为TNBC诊断和预后生物标志物的转移相关基因。使用来自癌症基因组图谱的RNA-seq数据和TNBC的临床信息进行分析。利用表达数据建立共表达模块,采用平均链接分层聚类。我们使用加权基因共表达网络分析来探索基因集与临床特征之间的关系,并确定与转移相关的候选生物标志物。使用K-M绘图仪网站探索候选生物标志物的表达与患者生存之间的关系。此外,采用受试者工作特征曲线分析来说明候选基因的诊断性能。淡蓝绿色模块与转移的发生显著相关。该模块共鉴定出64个基因,功能富集分析显示,这些基因主要与肿瘤中的转录失调、肿瘤中的microrna以及血管生成的负调控有关。此外,我们还选择了4个基因IGSF10、RUNX1T1、XIST和TSHZ2,这4个基因与TNBC的无复发生存呈负相关,之前很少有报道。此外,与健康组织相比,这4个候选基因在TNBC肿瘤组织中的mRNA表达水平显著降低。基于K-M绘图仪,这4个基因与TNBC预后不良相关。IGSF10、RUNX1T1、TSHZ2、XIST的曲线下面积分别为0.918、0.957、0.977、0.749。这些发现为TNBC转移提供了新的认识。IGSF10、RUNX1T1、TSHZ2和XIST可作为TNBC转移诊断和预后的候选生物标志物。
{"title":"Identification of Metastasis-Associated Genes in Triple-Negative Breast Cancer Using Weighted Gene Co-expression Network Analysis.","authors":"Wenting Xie,&nbsp;Zhongshi Du,&nbsp;Yijie Chen,&nbsp;Naxiang Liu,&nbsp;Zhaoming Zhong,&nbsp;Youhong Shen,&nbsp;Lina Tang","doi":"10.1177/1176934320954868","DOIUrl":"https://doi.org/10.1177/1176934320954868","url":null,"abstract":"<p><p>Triple-negative breast cancer (TNBC) is the most aggressive and fatal sub-type of breast cancer. This study aimed to identify metastasis-associated genes that could serve as biomarkers for TNBC diagnosis and prognosis. RNA-seq data and clinical information on TNBC from the Cancer Genome Atlas were used to conduct analyses. Expression data were used to establish co-expression modules using average linkage hierarchical clustering. We used weighted gene co-expression network analysis to explore the associations between gene sets and clinical features and to identify metastasis-associated candidate biomarkers. The K-M plotter website was used to explore the association between the expression of candidate biomarkers and patient survival. In addition, receiver operating characteristic curve analysis was used to illustrate the diagnostic performance of candidate genes. The pale turquoise module was significantly associated with the occurrence of metastasis. In this module, 64 genes were identified, and its functional enrichment analysis revealed that they were mainly associated with transcriptional misregulation in cancer, microRNAs in cancer, and negative regulation of angiogenesis. Further, 4 genes, <i>IGSF10, RUNX1T1, XIST</i>, and <i>TSHZ2</i>, which were negatively associated with relapse-free survival and have seldom been reported before in TNBC, were selected. In addition, the mRNA expression levels of the 4 candidate genes were significantly lower in TNBC tumor tissues compared with healthy tissues. Based on the K-M plotter, these 4 genes were correlated with poor prognosis of TNBC. The area under the curve of <i>IGSF10, RUNX1T1, TSHZ2</i>, and <i>XIST</i> was 0.918, 0.957, 0.977, and 0.749. These findings provide new insight into TNBC metastasis. <i>IGSF10, RUNX1T1, TSHZ2</i>, and <i>XIST</i> could be used as candidate biomarkers for the diagnosis and prognosis of TNBC metastasis.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320954868"},"PeriodicalIF":2.6,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320954868","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38496159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Complete Genome and Comparative Genome Analysis of Lactobacillus reuteri YSJL-12, a Potential Probiotics Strain Isolated From Healthy Sow Fresh Feces. 健康母猪新鲜粪便中潜在益生菌罗伊氏乳杆菌YSJL-12的全基因组及比较基因组分析
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-27 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320942192
Su Xu, Jianjun Cheng, Xiangchen Meng, Yan Xu, Ying Mu

Lactobacillus reuteri YSJL-12 was isolated from healthy sow fresh feces and used as probiotics additives previously. To investigate the genetic basis on probiotic potential and identify the genes in the strain, the complete genome of YSJL-12 was sequenced. Then comparative genome analysis on 9 strains of Lactobacillus reuteri was performed. The genome of YSJL-12 consisted of a circular 2,084,748 bp chromosome and 2 circular plasmids (51,906 and 15,134 bp). From among the 2065 protein-coding sequences (CDSs), the genes resistant to the environmental stress were identified. The function of COG (Clusters of Orthologous Group) protein genes was predicted, and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were analyzed. The comparative genome analysis indicated that the pan-genome contained a core genome of 1257 orthologous gene clusters, an accessory genome of 1064 orthologous gene clusters, and 1148 strain-specific genes, and the antibacterial mechanism among Lactobacillus reuteri strains might be different. The phylogenetic analysis and genomic collinearity revealed that the phylogenetic relationship among 9 strains of Lactobacillus reuteri was connected with host species and showed host specificity. The research could help us to better predict genes function and understand genetic basis on adapting to host gut in Lactobacillus reuteri YSJL-12.

罗伊氏乳杆菌YSJL-12是从健康母猪新鲜粪便中分离得到的,曾作为益生菌添加剂使用。为了研究该菌株益生菌潜力的遗传基础和鉴定菌株的基因,对YSJL-12进行了全基因组测序。对9株罗伊氏乳杆菌进行比较基因组分析。YSJL-12基因组由一条环状2084748 bp的染色体和两个环状质粒(51906 bp和15134 bp)组成。从2065个蛋白质编码序列(CDSs)中鉴定出抗环境胁迫的基因。预测了COG (Clusters of Orthologous Group)蛋白基因的功能,并分析了KEGG (Kyoto Encyclopedia of genes and Genomes)通路。比较基因组分析表明,该泛基因组包含1257个同源基因簇的核心基因组,1064个同源基因簇的辅助基因组,以及1148个菌株特异性基因,菌株间的抑菌机制可能存在差异。系统发育分析和基因组共线性分析表明,9株罗伊氏乳杆菌的系统发育关系与宿主种类有关,具有宿主特异性。本研究有助于更好地预测罗伊氏乳杆菌YSJL-12的基因功能,了解其适应宿主肠道的遗传基础。
{"title":"Complete Genome and Comparative Genome Analysis of <i>Lactobacillus reuteri</i> YSJL-12, a Potential Probiotics Strain Isolated From Healthy Sow Fresh Feces.","authors":"Su Xu,&nbsp;Jianjun Cheng,&nbsp;Xiangchen Meng,&nbsp;Yan Xu,&nbsp;Ying Mu","doi":"10.1177/1176934320942192","DOIUrl":"https://doi.org/10.1177/1176934320942192","url":null,"abstract":"<p><p><i>Lactobacillus reuteri</i> YSJL-12 was isolated from healthy sow fresh feces and used as probiotics additives previously. To investigate the genetic basis on probiotic potential and identify the genes in the strain, the complete genome of YSJL-12 was sequenced. Then comparative genome analysis on 9 strains of <i>Lactobacillus reuteri</i> was performed. The genome of YSJL-12 consisted of a circular 2,084,748 bp chromosome and 2 circular plasmids (51,906 and 15,134 bp). From among the 2065 protein-coding sequences (CDSs), the genes resistant to the environmental stress were identified. The function of COG (Clusters of Orthologous Group) protein genes was predicted, and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were analyzed. The comparative genome analysis indicated that the pan-genome contained a core genome of 1257 orthologous gene clusters, an accessory genome of 1064 orthologous gene clusters, and 1148 strain-specific genes, and the antibacterial mechanism among <i>Lactobacillus reuteri</i> strains might be different. The phylogenetic analysis and genomic collinearity revealed that the phylogenetic relationship among 9 strains of <i>Lactobacillus reuteri</i> was connected with host species and showed host specificity. The research could help us to better predict genes function and understand genetic basis on adapting to host gut in <i>Lactobacillus reuteri</i> YSJL-12.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320942192"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320942192","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling. 基于启发式计算模型的苏云金芽孢杆菌Cry11变体的生成
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-27 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320924681
Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal

Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.

定向进化方法模拟体外达尔文进化,诱导基因的随机突变和选择压力,以获得具有增强特性的蛋白质。这些技术是在具有高度不确定性的实验水平上使用试错测试开发的。因此,定向进化的计算机模拟需要支持实验分析。一些计算机方法利用统计、热力学和动力学模型再现了定向进化,试图重现实验条件。同样,使用启发式模型的优化技术已被用于理解和找到定向进化的最佳方案。本研究采用了启发式定向进化(HeurIstics DirecteD EvolutioN)的计算机模型,该模型基于遗传算法,从苏云金芽孢杆菌cry11Aa和cry11Ba两个亲本基因中生成嵌合文库。这些基因编码具有3个保守结构域的晶体状δ-内毒素。Cry11毒素在生物技术方面具有重要意义,因为它们已被证明是防治疾病传播媒介的有效生物农药。利用我们的启发模型,我们考虑了DNA片段长度、代数或模拟周期以及突变率等实验参数,以获得Cry11嵌合文库的特征,如群体身份的百分比、内部终止密码子的存在所获得的变体的截断、热力学多样性的百分比和变体的稳定性。我们的研究使我们能够专注于实验条件,这可能有助于设计具有3个保守结构域的Cry毒素定向进化的体外和计算机实验。此外,我们获得了Cry11变异体的计算机文库,其中野生Cry家族的结构特征在计算机序列样本的回顾中被观察到。我们认为未来的研究可以使用我们的芯片库和启发式计算模型,正如这里所建议的那样,来支持定向进化的体外实验。
{"title":"Generation of Cry11 Variants of <i>Bacillus thuringiensis</i> by Heuristic Computational Modeling.","authors":"Efraín Hernando Pinzón-Reyes,&nbsp;Daniel Alfonso Sierra-Bueno,&nbsp;Miguel Orlando Suarez-Barrera,&nbsp;Nohora Juliana Rueda-Forero,&nbsp;Sebastián Abaunza-Villamizar,&nbsp;Paola Rondón-Villareal","doi":"10.1177/1176934320924681","DOIUrl":"https://doi.org/10.1177/1176934320924681","url":null,"abstract":"<p><p>Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, <i>cry11Aa</i> and <i>cry11Ba</i>, of <i>Bacillus thuringiensis</i>. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. <i>Cry11</i> toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of <i>Cry11</i> chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with <i>Cry</i> toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of <i>Cry11</i> variants, in which structural characteristics of wild <i>Cry</i> families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320924681"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320924681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes. 利用全基因组关联研究中的同质性来加强细菌基因组中抗生素耐药突变的鉴定。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-27 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320944932
Yi-Pin Lai, Thomas R Ioerger

Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.

许多抗菌药物具有多种耐药机制,通常在临床人群中同时表现为耐药突变的混合(有些比其他更频繁)。这对全基因组关联研究(GWAS)方法提出了挑战,使得仅通过(弱)统计关联很难检测到不太普遍的耐药机制。在耐药突变中经常观察到同源性,或在同一位点发生多个独立突变,这可能是阳性选择的一个强有力的指标。然而,传统的GWAS方法,如基于等位基因计数或线性回归的方法,并没有考虑到同质性。在本文中,我们提出了一种新的方法,称为ECAT(基于进化聚类的关联测试),它扩展了传统的基于回归的GWAS方法,能够利用同质性。这是通过一个预处理步骤来实现的,该步骤识别出基因组中表现出统计学上显著的不同进化变化集群的高变量区域,并使用GEMMA(一种成熟的基于LMM的GWAS工具)应用线性混合模型(LMM)进行关联测试。因此,该方法可以被视为将GEMMA从通常的位点或基因水平分析扩展到关注突变的聚集区域。该方法在秘鲁利马收集的600多株耐多药结核分枝杆菌临床分离株中进行了评估。我们表明,作为现有GWAS方法的代表,与(基于位点或基因的)GEMMA相比,ECAT在检测几种抗结核药物的已知耐药突变(包括相关性较弱的不太普遍的突变)方面做得更好。ECAT中多阶段方法的强大之处在于将关联测试集中在基因组的高可变区域,这降低了模型的复杂性并提高了统计能力。
{"title":"Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes.","authors":"Yi-Pin Lai,&nbsp;Thomas R Ioerger","doi":"10.1177/1176934320944932","DOIUrl":"https://doi.org/10.1177/1176934320944932","url":null,"abstract":"<p><p>Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) <i>Mycobacterium tuberculosis</i> from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320944932"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320944932","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods. 利用分类和回归方法从基因表达数据预测RNA甲基化状态。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-20 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320915707
Hao Xue, Zhen Wei, Kunqi Chen, Yujiao Tang, Xiangyu Wu, Jionglong Su, Jia Meng

RNA N 6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.

RNA n6 -甲基腺苷(m6A)作为一种重要的表观遗传修饰,在调控RNA的稳定性、结构、加工和翻译等方面发挥着重要作用。m6A稳态的不稳定可能导致干细胞调控缺陷、生育能力下降和癌症风险。时至今日,RNA m6A修饰的实验检测和定量仍然是费时费力的。在现有的数据库中,只有有限数量的表转录组样本,并且匹配的RNA甲基化谱通常无法用于感兴趣的生物学问题。由于基因表达数据通常很容易用于大多数生物学问题,如果我们可以使用计算机方法从基因表达数据中估计RNA甲基化状态,这可能是有吸引力的。在这项研究中,我们基于73种实验条件下收集的小鼠RNA甲基化数据,利用分类和回归方法,探索了从基因表达数据中计算预测RNA甲基化状态的可能性。构建弹性网络正则化逻辑回归(ENLR)、支持向量机(SVM)和随机森林(RF)进行分类。SVM和RF在样本间的平均曲线下面积(AUC) = 0.84时均达到最佳效果;SVM的AUC分布较窄。对ENLR选择的预测位点进行基因位点富集分析,以获得模型的生物学意义。三个功能注释项:磷酸化蛋白、SRC同源3 (SH3)结构域和内质网具有统计学意义。这3项均与m6A通路密切相关。采用Elastic Net进行回归分析,Pearson相关系数均值为0.68,Spearman相关系数均值为0.64。我们的探索性研究表明,基因表达数据可以用于构建m6A甲基化状态的预测因子,并且具有足够的准确性。我们的工作首次表明,RNA甲基化状态可以从匹配的基因表达数据预测。当没有匹配的RNA甲基化谱时,特别是在研究的早期阶段,这一发现可能有助于在各种生物学背景下进行RNA修饰研究。
{"title":"Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.","authors":"Hao Xue,&nbsp;Zhen Wei,&nbsp;Kunqi Chen,&nbsp;Yujiao Tang,&nbsp;Xiangyu Wu,&nbsp;Jionglong Su,&nbsp;Jia Meng","doi":"10.1177/1176934320915707","DOIUrl":"https://doi.org/10.1177/1176934320915707","url":null,"abstract":"<p><p>RNA <i>N</i> <sup>6</sup>-methyladenosine (m<sup>6</sup>A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m<sup>6</sup>A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m<sup>6</sup>A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using <i>in silico</i> methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m<sup>6</sup>A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m<sup>6</sup>A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320915707"},"PeriodicalIF":2.6,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320915707","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38209900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Genetic Diversity and Prediction Analysis of Small Isolated Giant Panda Populations After Release of Individuals. 散居大熊猫个体释放后种群遗传多样性及预测分析。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-10 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320939945
Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai

Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.

个体放生是保护濒危物种的有效方法。为了拯救李子坪自然保护区这个孤立的小大熊猫种群,一些大熊猫被释放到这个种群中。在此,我们使用非侵入性基因采样方法评估大熊猫放归后种群的遗传多样性和未来变化。本研究在中国黎子坪共鉴定了28只大熊猫,其中包括4只放生个体。与其他大熊猫种群相比,该种群具有中等水平的遗传多样性;然而,贝叶斯聚结方法清楚地检测、量化和确定了最近种群规模的减少。对该种群未来100年的遗传多样性和生存的预测表明,该种群有很高的灭绝风险。我们的研究表明,在这个孤立的小大熊猫种群中,放生大熊猫可以保持遗传多样性,提高生存概率。为了促进该种群的恢复,我们建议继续放生大熊猫,未来该种群需要放生10只雄性大熊猫和20只雌性大熊猫。
{"title":"Genetic Diversity and Prediction Analysis of Small Isolated Giant Panda Populations After Release of Individuals.","authors":"Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai","doi":"10.1177/1176934320939945","DOIUrl":"10.1177/1176934320939945","url":null,"abstract":"<p><p>Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939945"},"PeriodicalIF":2.6,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38189798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families. 豆科和禾本科植物家族蛋白质结构域的增益和损失。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-07-09 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320939943
Akshay Yadav, David Fernández-Baca, Steven B Cannon
Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.
蛋白质结构域可以看作是能够独立折叠并执行特定功能的蛋白质序列片段。除了氨基酸水平的变化外,蛋白质序列还可以通过结构域改组事件(如结构域插入、删除或重复)进化。蛋白质结构域的进化可以通过跟踪一组已知系统发育关系的物种的结构域变化来研究。在这里,我们通过将域定义为“特征”或“描述符”,并将物种(目标+外群)视为数据矩阵中的实例或数据点来进行这样的分析。然后,我们寻找目标物种和外群物种之间显著不同的特征(域)。我们研究了豆科(Fabaceae)和禾本科(Poaceae)这两个大而不同的植物类群的域变化。我们评估了4种类型的领域特征矩阵:领域内容、领域重复、领域丰富度和领域多功能性。这四种类型的结构域特征矩阵试图捕捉蛋白质序列可能进化的结构域变化的不同方面,即通过结构域的获得或失去,序列中结构域拷贝数的增加或减少,结构域的扩展或收缩,或通过相邻结构域伙伴数量的变化。利用特征选择技术和统计检验对所有特征矩阵进行分析,筛选出豆科植物和禾本科植物中具有显著不同特征值的蛋白质结构域。我们报告了从所有特征矩阵的分析中选择的顶级域的生物学功能。此外,我们还对所有4个特征矩阵中选择的所有域进行了以域为中心的基因本体(dcGO)富集分析,以研究与豆科植物和禾本科植物中显著进化的域相关的基因本体术语。结构域含量分析显示,Fanconi贫血(FA)通路的蛋白结构域显著缺失,该通路负责DNA链间交联的修复。在豆类中发现的结构域丰度分析显示,固氮所需的抗氧化剂谷胱甘肽合成酶增加,而黄嘌呤氧化酶减少,这一现象已被先前的研究证实。在禾草中,丰度分析显示与基因沉默相关的结构域增加,这可能是由于多倍体或对病毒感染的反应增强所致。我们提供了一个docker容器,可用于在任何用户定义的物种集上执行此分析工作流,可在https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project上获得。
{"title":"Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.","authors":"Akshay Yadav,&nbsp;David Fernández-Baca,&nbsp;Steven B Cannon","doi":"10.1177/1176934320939943","DOIUrl":"https://doi.org/10.1177/1176934320939943","url":null,"abstract":"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939943"},"PeriodicalIF":2.6,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939943","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38186090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence. 结合Gabor特征的随机森林模型在蛋白质序列中预测蛋白质-蛋白质相互作用。
IF 2.6 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Pub Date : 2020-06-30 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320934498
Xin-Ke Zhan, Zhu-Hong You, Li-Ping Li, Yang Li, Zheng Wang, Jie Pan

Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of yeast, human, and Helicobacter pylori, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.

蛋白-蛋白相互作用(PPIs)在活细胞的生命周期中起着至关重要的作用。因此,了解ppi的潜在机制非常重要。尽管许多高通量技术已经在不同的生物体中产生了大量的PPI数据,但检测PPI的实验仍然昂贵且耗时。因此,迫切需要新的计算方法来预测ppi。因此,开发一种新的预测ppi的计算方法越来越受到人们的重视。在这项研究中,我们提出了一种新的基于蛋白质序列纹理特征的预测ppi的计算方法。特别地,利用Gabor特征从位置特定迭代基本局部比对搜索工具生成的位置特定评分矩阵中提取纹理特征和蛋白质进化信息。然后,使用基于随机森林的分类器来推断蛋白质的相互作用。当对酵母、人类和幽门螺杆菌的PPI数据集进行分析时,我们获得了良好的结果,平均准确率分别为92.10%、97.03%和86.45%。为了更好地评价所提出的方法,我们比较了Gabor特征、离散余弦变换和局部相位量化。结果表明,该方法可行且稳定,Gabor特征描述符在提取蛋白质序列信息方面是可靠的。此外,还进行了其他4种数据集的ppi预测实验。结果表明,该方法具有强大的鲁棒性。
{"title":"Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence.","authors":"Xin-Ke Zhan,&nbsp;Zhu-Hong You,&nbsp;Li-Ping Li,&nbsp;Yang Li,&nbsp;Zheng Wang,&nbsp;Jie Pan","doi":"10.1177/1176934320934498","DOIUrl":"https://doi.org/10.1177/1176934320934498","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of <i>yeast, human</i>, and <i>Helicobacter pylori</i>, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320934498"},"PeriodicalIF":2.6,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320934498","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38150704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Evolutionary Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1