Evolutionary Bioinformatics最新文献_第10页

Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling. 基于启发式计算模型的苏云金芽孢杆菌Cry11变体的生成

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-07-27 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320924681

Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal

Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.

定向进化方法模拟体外达尔文进化，诱导基因的随机突变和选择压力，以获得具有增强特性的蛋白质。这些技术是在具有高度不确定性的实验水平上使用试错测试开发的。因此，定向进化的计算机模拟需要支持实验分析。一些计算机方法利用统计、热力学和动力学模型再现了定向进化，试图重现实验条件。同样，使用启发式模型的优化技术已被用于理解和找到定向进化的最佳方案。本研究采用了启发式定向进化(HeurIstics DirecteD EvolutioN)的计算机模型，该模型基于遗传算法，从苏云金芽孢杆菌cry11Aa和cry11Ba两个亲本基因中生成嵌合文库。这些基因编码具有3个保守结构域的晶体状δ-内毒素。Cry11毒素在生物技术方面具有重要意义，因为它们已被证明是防治疾病传播媒介的有效生物农药。利用我们的启发模型，我们考虑了DNA片段长度、代数或模拟周期以及突变率等实验参数，以获得Cry11嵌合文库的特征，如群体身份的百分比、内部终止密码子的存在所获得的变体的截断、热力学多样性的百分比和变体的稳定性。我们的研究使我们能够专注于实验条件，这可能有助于设计具有3个保守结构域的Cry毒素定向进化的体外和计算机实验。此外，我们获得了Cry11变异体的计算机文库，其中野生Cry家族的结构特征在计算机序列样本的回顾中被观察到。我们认为未来的研究可以使用我们的芯片库和启发式计算模型，正如这里所建议的那样，来支持定向进化的体外实验。

{"title":"Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling.","authors":"Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal","doi":"10.1177/1176934320924681","DOIUrl":"https://doi.org/10.1177/1176934320924681","url":null,"abstract":"Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320924681"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320924681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes. 利用全基因组关联研究中的同质性来加强细菌基因组中抗生素耐药突变的鉴定。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-07-27 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320944932

Yi-Pin Lai, Thomas R Ioerger

Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.

许多抗菌药物具有多种耐药机制，通常在临床人群中同时表现为耐药突变的混合(有些比其他更频繁)。这对全基因组关联研究(GWAS)方法提出了挑战，使得仅通过(弱)统计关联很难检测到不太普遍的耐药机制。在耐药突变中经常观察到同源性，或在同一位点发生多个独立突变，这可能是阳性选择的一个强有力的指标。然而，传统的GWAS方法，如基于等位基因计数或线性回归的方法，并没有考虑到同质性。在本文中，我们提出了一种新的方法，称为ECAT(基于进化聚类的关联测试)，它扩展了传统的基于回归的GWAS方法，能够利用同质性。这是通过一个预处理步骤来实现的，该步骤识别出基因组中表现出统计学上显著的不同进化变化集群的高变量区域，并使用GEMMA(一种成熟的基于LMM的GWAS工具)应用线性混合模型(LMM)进行关联测试。因此，该方法可以被视为将GEMMA从通常的位点或基因水平分析扩展到关注突变的聚集区域。该方法在秘鲁利马收集的600多株耐多药结核分枝杆菌临床分离株中进行了评估。我们表明，作为现有GWAS方法的代表，与(基于位点或基因的)GEMMA相比，ECAT在检测几种抗结核药物的已知耐药突变(包括相关性较弱的不太普遍的突变)方面做得更好。ECAT中多阶段方法的强大之处在于将关联测试集中在基因组的高可变区域，这降低了模型的复杂性并提高了统计能力。

{"title":"Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes.","authors":"Yi-Pin Lai, Thomas R Ioerger","doi":"10.1177/1176934320944932","DOIUrl":"https://doi.org/10.1177/1176934320944932","url":null,"abstract":"Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320944932"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320944932","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods. 利用分类和回归方法从基因表达数据预测RNA甲基化状态。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-07-20 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320915707

Hao Xue, Zhen Wei, Kunqi Chen, Yujiao Tang, Xiangyu Wu, Jionglong Su, Jia Meng

RNA N ⁶-methyladenosine (m⁶A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m⁶A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m⁶A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m⁶A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m⁶A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.

RNA n6 -甲基腺苷(m6A)作为一种重要的表观遗传修饰，在调控RNA的稳定性、结构、加工和翻译等方面发挥着重要作用。m6A稳态的不稳定可能导致干细胞调控缺陷、生育能力下降和癌症风险。时至今日，RNA m6A修饰的实验检测和定量仍然是费时费力的。在现有的数据库中，只有有限数量的表转录组样本，并且匹配的RNA甲基化谱通常无法用于感兴趣的生物学问题。由于基因表达数据通常很容易用于大多数生物学问题，如果我们可以使用计算机方法从基因表达数据中估计RNA甲基化状态，这可能是有吸引力的。在这项研究中，我们基于73种实验条件下收集的小鼠RNA甲基化数据，利用分类和回归方法，探索了从基因表达数据中计算预测RNA甲基化状态的可能性。构建弹性网络正则化逻辑回归(ENLR)、支持向量机(SVM)和随机森林(RF)进行分类。SVM和RF在样本间的平均曲线下面积(AUC) = 0.84时均达到最佳效果;SVM的AUC分布较窄。对ENLR选择的预测位点进行基因位点富集分析，以获得模型的生物学意义。三个功能注释项:磷酸化蛋白、SRC同源3 (SH3)结构域和内质网具有统计学意义。这3项均与m6A通路密切相关。采用Elastic Net进行回归分析，Pearson相关系数均值为0.68,Spearman相关系数均值为0.64。我们的探索性研究表明，基因表达数据可以用于构建m6A甲基化状态的预测因子，并且具有足够的准确性。我们的工作首次表明，RNA甲基化状态可以从匹配的基因表达数据预测。当没有匹配的RNA甲基化谱时，特别是在研究的早期阶段，这一发现可能有助于在各种生物学背景下进行RNA修饰研究。

{"title":"Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.","authors":"Hao Xue, Zhen Wei, Kunqi Chen, Yujiao Tang, Xiangyu Wu, Jionglong Su, Jia Meng","doi":"10.1177/1176934320915707","DOIUrl":"https://doi.org/10.1177/1176934320915707","url":null,"abstract":"RNA N 6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320915707"},"PeriodicalIF":2.6,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320915707","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38209900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Genetic Diversity and Prediction Analysis of Small Isolated Giant Panda Populations After Release of Individuals. 散居大熊猫个体释放后种群遗传多样性及预测分析。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-07-10 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320939945

Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai

Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.

个体放生是保护濒危物种的有效方法。为了拯救李子坪自然保护区这个孤立的小大熊猫种群，一些大熊猫被释放到这个种群中。在此，我们使用非侵入性基因采样方法评估大熊猫放归后种群的遗传多样性和未来变化。本研究在中国黎子坪共鉴定了28只大熊猫，其中包括4只放生个体。与其他大熊猫种群相比，该种群具有中等水平的遗传多样性;然而，贝叶斯聚结方法清楚地检测、量化和确定了最近种群规模的减少。对该种群未来100年的遗传多样性和生存的预测表明，该种群有很高的灭绝风险。我们的研究表明，在这个孤立的小大熊猫种群中，放生大熊猫可以保持遗传多样性，提高生存概率。为了促进该种群的恢复，我们建议继续放生大熊猫，未来该种群需要放生10只雄性大熊猫和20只雌性大熊猫。

{"title":"Genetic Diversity and Prediction Analysis of Small Isolated Giant Panda Populations After Release of Individuals.","authors":"Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai","doi":"10.1177/1176934320939945","DOIUrl":"10.1177/1176934320939945","url":null,"abstract":"Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939945"},"PeriodicalIF":2.6,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38189798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families. 豆科和禾本科植物家族蛋白质结构域的增益和损失。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-07-09 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320939943

Akshay Yadav, David Fernández-Baca, Steven B Cannon

Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.

蛋白质结构域可以看作是能够独立折叠并执行特定功能的蛋白质序列片段。除了氨基酸水平的变化外，蛋白质序列还可以通过结构域改组事件(如结构域插入、删除或重复)进化。蛋白质结构域的进化可以通过跟踪一组已知系统发育关系的物种的结构域变化来研究。在这里，我们通过将域定义为“特征”或“描述符”，并将物种(目标+外群)视为数据矩阵中的实例或数据点来进行这样的分析。然后，我们寻找目标物种和外群物种之间显著不同的特征(域)。我们研究了豆科(Fabaceae)和禾本科(Poaceae)这两个大而不同的植物类群的域变化。我们评估了4种类型的领域特征矩阵:领域内容、领域重复、领域丰富度和领域多功能性。这四种类型的结构域特征矩阵试图捕捉蛋白质序列可能进化的结构域变化的不同方面，即通过结构域的获得或失去，序列中结构域拷贝数的增加或减少，结构域的扩展或收缩，或通过相邻结构域伙伴数量的变化。利用特征选择技术和统计检验对所有特征矩阵进行分析，筛选出豆科植物和禾本科植物中具有显著不同特征值的蛋白质结构域。我们报告了从所有特征矩阵的分析中选择的顶级域的生物学功能。此外，我们还对所有4个特征矩阵中选择的所有域进行了以域为中心的基因本体(dcGO)富集分析，以研究与豆科植物和禾本科植物中显著进化的域相关的基因本体术语。结构域含量分析显示，Fanconi贫血(FA)通路的蛋白结构域显著缺失，该通路负责DNA链间交联的修复。在豆类中发现的结构域丰度分析显示，固氮所需的抗氧化剂谷胱甘肽合成酶增加，而黄嘌呤氧化酶减少，这一现象已被先前的研究证实。在禾草中，丰度分析显示与基因沉默相关的结构域增加，这可能是由于多倍体或对病毒感染的反应增强所致。我们提供了一个docker容器，可用于在任何用户定义的物种集上执行此分析工作流，可在https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project上获得。

{"title":"Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.","authors":"Akshay Yadav, David Fernández-Baca, Steven B Cannon","doi":"10.1177/1176934320939943","DOIUrl":"https://doi.org/10.1177/1176934320939943","url":null,"abstract":"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939943"},"PeriodicalIF":2.6,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939943","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38186090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence. 结合Gabor特征的随机森林模型在蛋白质序列中预测蛋白质-蛋白质相互作用。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-06-30 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320934498

Xin-Ke Zhan, Zhu-Hong You, Li-Ping Li, Yang Li, Zheng Wang, Jie Pan

Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of yeast, human, and Helicobacter pylori, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.

蛋白-蛋白相互作用(PPIs)在活细胞的生命周期中起着至关重要的作用。因此，了解ppi的潜在机制非常重要。尽管许多高通量技术已经在不同的生物体中产生了大量的PPI数据，但检测PPI的实验仍然昂贵且耗时。因此，迫切需要新的计算方法来预测ppi。因此，开发一种新的预测ppi的计算方法越来越受到人们的重视。在这项研究中，我们提出了一种新的基于蛋白质序列纹理特征的预测ppi的计算方法。特别地，利用Gabor特征从位置特定迭代基本局部比对搜索工具生成的位置特定评分矩阵中提取纹理特征和蛋白质进化信息。然后，使用基于随机森林的分类器来推断蛋白质的相互作用。当对酵母、人类和幽门螺杆菌的PPI数据集进行分析时，我们获得了良好的结果，平均准确率分别为92.10%、97.03%和86.45%。为了更好地评价所提出的方法，我们比较了Gabor特征、离散余弦变换和局部相位量化。结果表明，该方法可行且稳定，Gabor特征描述符在提取蛋白质序列信息方面是可靠的。此外，还进行了其他4种数据集的ppi预测实验。结果表明，该方法具有强大的鲁棒性。

{"title":"Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence.","authors":"Xin-Ke Zhan, Zhu-Hong You, Li-Ping Li, Yang Li, Zheng Wang, Jie Pan","doi":"10.1177/1176934320934498","DOIUrl":"https://doi.org/10.1177/1176934320934498","url":null,"abstract":"Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of yeast, human, and Helicobacter pylori, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320934498"},"PeriodicalIF":2.6,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320934498","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38150704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Novel Cecropin D-Derived Short Cationic Antimicrobial Peptide Exhibits Antibacterial Activity Against Wild-Type and Multidrug-Resistant Strains of Klebsiella pneumoniae and Pseudomonas aeruginosa. 一种新型天蝎素d衍生的短阳离子抗菌肽对肺炎克雷伯菌和铜绿假单胞菌的野生型和多重耐药菌株具有抗菌活性。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-06-26 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320936266

Iván Darío Ocampo-Ibáñez, Yamil Liscano, Sandra Patricia Rivera-Sánchez, José Oñate-Garzón, Ashley Dayan Lugo-Guevara, Liliana Janeth Flórez-Elvira, Maria Cristina Lesmes

Infections caused by multidrug-resistant (MDR) Pseudomonas aeruginosa and Klebsiella pneumoniae are a serious worldwide public health concern due to the ineffectiveness of empirical antibiotic therapy. Therefore, research and the development of new antibiotic alternatives are urgently needed to control these bacteria. The use of cationic antimicrobial peptides (CAMPs) is a promising candidate alternative therapeutic strategy to antibiotics because they exhibit antibacterial activity against both antibiotic susceptible and MDR strains. In this study, we aimed to investigate the in vitro antibacterial effect of a short synthetic CAMP derived from the ΔM2 analog of Cec D-like (CAMP-CecD) against clinical isolates of K pneumoniae (n = 30) and P aeruginosa (n = 30), as well as its hemolytic activity. Minimal inhibitory concentrations (MICs) and minimal bactericidal concentrations (MBCs) of CAMP-CecD against wild-type and MDR strains were determined by the broth microdilution test. In addition, an in silico molecular dynamic simulation was performed to predict the interaction between CAMP-CecD and membrane models of K pneumoniae and P aeruginosa. The results revealed a bactericidal effect of CAMP-CecD against both wild-type and resistant strains, but MDR P aeruginosa showed higher susceptibility to this peptide with MIC values between 32 and >256 μg/mL. CAMP-CecD showed higher stability in the P aeruginosa membrane model compared with the K pneumoniae model due to the greater number of noncovalent interactions with phospholipid 1-Palmitoyl-2-oleyl-sn-glycero-3-(phospho-rac-(1-glycerol)) (POPG). This may be related to the boosted effectiveness of the peptide against P aeruginosa clinical isolates. Given the antibacterial activity of CAMP-CecD against wild-type and MDR clinical isolates of P aeruginosa and K pneumoniae and its nonhemolytic effects on human erythrocytes, CAMP-CecD may be a promising alternative to conventional antibiotics.

多药耐药(MDR)铜绿假单胞菌和肺炎克雷伯菌引起的感染是一个严重的全球公共卫生问题，由于经验抗生素治疗无效。因此，迫切需要研究和开发新的抗生素替代品来控制这些细菌。使用阳离子抗菌肽(camp)是一种很有前途的替代抗生素治疗策略，因为它们对抗生素敏感和耐多药菌株都具有抗菌活性。在这项研究中，我们旨在研究由ΔM2类似物Cec D-like衍生的短合成CAMP (CAMP- ecd)对临床分离的肺炎K菌(n = 30)和铜绿假单胞菌(n = 30)的体外抗菌作用及其溶血活性。通过肉汤微量稀释试验测定camp - ced对野生型和耐多药菌株的最低抑菌浓度(mic)和最低杀菌浓度(MBCs)。此外，我们还进行了硅分子动力学模拟来预测camp - ced与肺炎K菌和铜绿假单胞菌膜模型之间的相互作用。结果表明，camp - ced对野生型和耐药菌株均有杀菌作用，但耐多药铜绿假单胞菌对该肽的敏感性较高，MIC值在32 ~ >256 μg/mL之间。与肺炎K菌模型相比，camp - ced在铜绿假单胞菌膜模型中表现出更高的稳定性，这是因为camp - ced与磷脂1-棕榈酰-2-油酯- cn -甘油-3-(磷酸-rac-(1-甘油))(POPG)的非共价相互作用数量更多。这可能与肽对铜绿假单胞菌临床分离株的增强有效性有关。鉴于camp - ced对铜绿假单胞菌和肺炎克雷伯菌野生型和耐多药临床分离株的抗菌活性及其对人红细胞的非溶血作用，camp - ced可能是传统抗生素的有希望的替代品。

{"title":"A Novel Cecropin D-Derived Short Cationic Antimicrobial Peptide Exhibits Antibacterial Activity Against Wild-Type and Multidrug-Resistant Strains of Klebsiella pneumoniae and Pseudomonas aeruginosa.","authors":"Iván Darío Ocampo-Ibáñez, Yamil Liscano, Sandra Patricia Rivera-Sánchez, José Oñate-Garzón, Ashley Dayan Lugo-Guevara, Liliana Janeth Flórez-Elvira, Maria Cristina Lesmes","doi":"10.1177/1176934320936266","DOIUrl":"https://doi.org/10.1177/1176934320936266","url":null,"abstract":"Infections caused by multidrug-resistant (MDR) Pseudomonas aeruginosa and Klebsiella pneumoniae are a serious worldwide public health concern due to the ineffectiveness of empirical antibiotic therapy. Therefore, research and the development of new antibiotic alternatives are urgently needed to control these bacteria. The use of cationic antimicrobial peptides (CAMPs) is a promising candidate alternative therapeutic strategy to antibiotics because they exhibit antibacterial activity against both antibiotic susceptible and MDR strains. In this study, we aimed to investigate the in vitro antibacterial effect of a short synthetic CAMP derived from the ΔM2 analog of Cec D-like (CAMP-CecD) against clinical isolates of K pneumoniae (n = 30) and P aeruginosa (n = 30), as well as its hemolytic activity. Minimal inhibitory concentrations (MICs) and minimal bactericidal concentrations (MBCs) of CAMP-CecD against wild-type and MDR strains were determined by the broth microdilution test. In addition, an in silico molecular dynamic simulation was performed to predict the interaction between CAMP-CecD and membrane models of K pneumoniae and P aeruginosa. The results revealed a bactericidal effect of CAMP-CecD against both wild-type and resistant strains, but MDR P aeruginosa showed higher susceptibility to this peptide with MIC values between 32 and >256 μg/mL. CAMP-CecD showed higher stability in the P aeruginosa membrane model compared with the K pneumoniae model due to the greater number of noncovalent interactions with phospholipid 1-Palmitoyl-2-oleyl-sn-glycero-3-(phospho-rac-(1-glycerol)) (POPG). This may be related to the boosted effectiveness of the peptide against P aeruginosa clinical isolates. Given the antibacterial activity of CAMP-CecD against wild-type and MDR clinical isolates of P aeruginosa and K pneumoniae and its nonhemolytic effects on human erythrocytes, CAMP-CecD may be a promising alternative to conventional antibiotics.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320936266"},"PeriodicalIF":2.6,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320936266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38135430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Inferring Causation in Yeast Gene Association Networks With Kernel Logistic Regression. 用核逻辑回归推断酵母基因关联网络的因果关系。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-06-24 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320920310

Amira Al-Aamri, Kamal Taha, Maher Maalouf, Andrzej Kudlicki, Dirar Homouz

Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.

基因间关联的计算预测是生物信息学研究的重要方向之一。利用不同的生物数据来源，人们开发了许多工具来推断基因之间的关系。从生物学数据分析中推断出的一对基因的关联，只有在反映出基因间反应的方向性和类型时才有意义。在这项工作中，我们采用另一种方法构建因果基因共表达网络，同时使用微阵列表达数据识别每对基因中的转录因子。我们采用基于逻辑回归模型的机器学习技术来解决网络的稀疏性，提高预测精度的质量。该系统利用整个酿酒酵母基因组中这些基因之间的相关性数据，将每对基因分为连接类或非连接类。使用酵母调控网络的几个数据集评估了分类模型在预测相关基因方面的准确性。我们的系统在几个统计指标方面实现了高性能。

{"title":"Inferring Causation in Yeast Gene Association Networks With Kernel Logistic Regression.","authors":"Amira Al-Aamri, Kamal Taha, Maher Maalouf, Andrzej Kudlicki, Dirar Homouz","doi":"10.1177/1176934320920310","DOIUrl":"https://doi.org/10.1177/1176934320920310","url":null,"abstract":"Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320920310"},"PeriodicalIF":2.6,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320920310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39929540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Descent of Bacteria and Eukarya From an Archaeal Root of Life. 细菌和真核生物从古细菌的生命根源演化而来。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-06-23 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320908267

Xi Long, Hong Xue, J Tze-Fei Wong

The 3 biological domains delineated based on small subunit ribosomal RNAs (SSU rRNAs) are confronted by uncertainties regarding the relationship between Archaea and Bacteria, and the origin of Eukarya. The similarities between the paralogous valyl-tRNA and isoleucyl-tRNA synthetases in 5398 species estimated by BLASTP, which decreased from Archaea to Bacteria and further to Eukarya, were consistent with vertical gene transmission from an archaeal root of life close to Methanopyrus kandleri through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster, and to Eukarya. The predominant similarities of the ribosomal proteins (rProts) of eukaryotes toward archaeal rProts relative to bacterial rProts established that an archaeal parent rather than a bacterial parent underwent genome merger with bacteria to generate eukaryotes with mitochondria. Eukaryogenesis benefited from the predominantly archaeal accelerated gene adoption (AGA) phenotype pertaining to horizontally transferred genes from other prokaryotes and expedited genome evolution via both gene-content mutations and nucleotidyl mutations. Archaeons endowed with substantial AGA activity were accordingly favored as candidate archaeal parents. Based on the top similarity bitscores displayed by their proteomes toward the eukaryotic proteomes of Giardia and Trichomonas, and high AGA activity, the Aciduliprofundum archaea were identified as leading candidates of the archaeal parent. The Asgard archaeons and a number of bacterial species were among the foremost potential contributors of eukaryotic-like proteins to Eukarya.

基于小亚基核糖体rna (SSU rrna)划定的3个生物结构域面临着关于古细菌和细菌之间关系以及真核生物起源的不确定性。BLASTP分析的5398个物种中谷氨酸- trna和异质基- trna合成酶的相似性，从古细菌到细菌，再到真核生物，呈下降趋势，这与基因从接近kandlermethanopyrus的古细菌根，通过原始古细菌群到祖先细菌群，再到真核生物的垂直传播是一致的。真核生物的核糖体蛋白(rProts)与古细菌的rProts相对于细菌的rProts的主要相似性表明，古细菌亲本而不是细菌亲本通过与细菌的基因组合并来产生具有线粒体的真核生物。真核发生主要得益于古细菌加速基因采用(AGA)表型，这种表型与其他原核生物水平转移的基因有关，并通过基因含量突变和核苷酸突变加速了基因组进化。因此，具有大量AGA活性的古菌被认为是古菌亲本。基于它们的蛋白质组与贾第鞭毛虫和毛滴虫的真核蛋白质组的最高相似性，以及较高的AGA活性，确定了aciduliproundum古细菌是古菌亲本的主要候选菌株。阿斯加德古菌和一些细菌物种是真核生物类蛋白质的主要潜在贡献者。

{"title":"Descent of Bacteria and Eukarya From an Archaeal Root of Life.","authors":"Xi Long, Hong Xue, J Tze-Fei Wong","doi":"10.1177/1176934320908267","DOIUrl":"10.1177/1176934320908267","url":null,"abstract":"The 3 biological domains delineated based on small subunit ribosomal RNAs (SSU rRNAs) are confronted by uncertainties regarding the relationship between Archaea and Bacteria, and the origin of Eukarya. The similarities between the paralogous valyl-tRNA and isoleucyl-tRNA synthetases in 5398 species estimated by BLASTP, which decreased from Archaea to Bacteria and further to Eukarya, were consistent with vertical gene transmission from an archaeal root of life close to Methanopyrus kandleri through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster, and to Eukarya. The predominant similarities of the ribosomal proteins (rProts) of eukaryotes toward archaeal rProts relative to bacterial rProts established that an archaeal parent rather than a bacterial parent underwent genome merger with bacteria to generate eukaryotes with mitochondria. Eukaryogenesis benefited from the predominantly archaeal accelerated gene adoption (AGA) phenotype pertaining to horizontally transferred genes from other prokaryotes and expedited genome evolution via both gene-content mutations and nucleotidyl mutations. Archaeons endowed with substantial AGA activity were accordingly favored as candidate archaeal parents. Based on the top similarity bitscores displayed by their proteomes toward the eukaryotic proteomes of Giardia and Trichomonas, and high AGA activity, the Aciduliprofundum archaea were identified as leading candidates of the archaeal parent. The Asgard archaeons and a number of bacterial species were among the foremost potential contributors of eukaryotic-like proteins to Eukarya.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320908267"},"PeriodicalIF":2.6,"publicationDate":"2020-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320908267","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38135429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Identification of Transposable Elements in Conifer and Their Potential Application in Breeding. 针叶树转座因子的鉴定及其在育种中的应用前景。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2020-06-15 eCollection Date: 2020-01-01 DOI: 10.1177/1176934320930263

Junhui Wang, Nan Lu, Fei Yi, Yao Xiao

Transposable elements (TEs) are known to play a role in genome evolution, gene regulation, and epigenetics, representing potential tools for genetics research in and breeding of conifers. Recently, thanks to the development of high-throughput sequencing, more conifer genomes have been reported. Using bioinformatics tools, the TEs of 3 important conifers (Picea abies, Picea glauce, and Pinus taeda) were identified in our previous study, which provided a foundation for accelerating the use of TEs in conifer breeding and genetic study. Here, we review recent studies on the functional biology of TEs and discuss the potential applications for TEs in conifers.

转座因子(te)在基因组进化、基因调控和表观遗传学中发挥着重要作用，是针叶树遗传研究和育种的潜在工具。近年来，由于高通量测序技术的发展，越来越多的针叶树基因组被报道。本研究利用生物信息学工具鉴定了3种重要针叶树(冷杉、青松和松)的te，为加快te在针叶树育种和遗传研究中的应用奠定了基础。本文综述了近年来te在针叶树中的功能生物学研究，并对te在针叶树中的应用前景进行了展望。

引用次数: 8