Pub Date : 2020-07-27eCollection Date: 2020-01-01DOI: 10.1177/1176934320924681
Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal
Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.
{"title":"Generation of Cry11 Variants of <i>Bacillus thuringiensis</i> by Heuristic Computational Modeling.","authors":"Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal","doi":"10.1177/1176934320924681","DOIUrl":"https://doi.org/10.1177/1176934320924681","url":null,"abstract":"<p><p>Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, <i>cry11Aa</i> and <i>cry11Ba</i>, of <i>Bacillus thuringiensis</i>. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. <i>Cry11</i> toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of <i>Cry11</i> chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with <i>Cry</i> toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of <i>Cry11</i> variants, in which structural characteristics of wild <i>Cry</i> families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320924681"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320924681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-27eCollection Date: 2020-01-01DOI: 10.1177/1176934320944932
Yi-Pin Lai, Thomas R Ioerger
Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.
{"title":"Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes.","authors":"Yi-Pin Lai, Thomas R Ioerger","doi":"10.1177/1176934320944932","DOIUrl":"https://doi.org/10.1177/1176934320944932","url":null,"abstract":"<p><p>Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) <i>Mycobacterium tuberculosis</i> from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320944932"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320944932","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RNA N6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.
{"title":"Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.","authors":"Hao Xue, Zhen Wei, Kunqi Chen, Yujiao Tang, Xiangyu Wu, Jionglong Su, Jia Meng","doi":"10.1177/1176934320915707","DOIUrl":"https://doi.org/10.1177/1176934320915707","url":null,"abstract":"<p><p>RNA <i>N</i> <sup>6</sup>-methyladenosine (m<sup>6</sup>A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m<sup>6</sup>A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m<sup>6</sup>A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using <i>in silico</i> methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m<sup>6</sup>A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m<sup>6</sup>A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320915707"},"PeriodicalIF":2.6,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320915707","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38209900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-10eCollection Date: 2020-01-01DOI: 10.1177/1176934320939945
Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai
Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.
{"title":"Genetic Diversity and Prediction Analysis of Small Isolated Giant Panda Populations After Release of Individuals.","authors":"Qin-Long Dai, Jian-Wei Li, Yi Yang, Min Li, Kan Zhang, Liu-Yang He, Jun Zhang, Bo Tang, Hui-Ping Liu, Yu-Xia Li, Li-Feng Zhu, Zhi-Song Yang, Qiang Dai","doi":"10.1177/1176934320939945","DOIUrl":"10.1177/1176934320939945","url":null,"abstract":"<p><p>Release of individuals is an effective conservation approach to protect endangered species. To save this small isolated giant panda population in Liziping Nature Reserve, a few giant pandas have been released to this population. Here we assess genetic diversity and future changes in the population using noninvasive genetic sampling after releasing giant pandas. In this study, a total of 28 giant pandas (including 4 released individuals) were identified in the Liziping, China. Compared with other giant panda populations, this population has medium-level genetic diversity; however, a Bayesian-coalescent method clearly detected, quantified, and dated a recent decrease in population size. The predictions for genetic diversity and survival of the population in the next 100 years indicate that this population has a high risk of extinction. We show that released giant pandas can preserve genetic diversity and improve the probability of survival in this small isolated giant panda population. To promote the recovery of this population, we suggest that panda release should be continued and this population will need to release 10 males and 20 females in the future.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939945"},"PeriodicalIF":2.6,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38189798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-09eCollection Date: 2020-01-01DOI: 10.1177/1176934320939943
Akshay Yadav, David Fernández-Baca, Steven B Cannon
Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.
{"title":"Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families.","authors":"Akshay Yadav, David Fernández-Baca, Steven B Cannon","doi":"10.1177/1176934320939943","DOIUrl":"https://doi.org/10.1177/1176934320939943","url":null,"abstract":"Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320939943"},"PeriodicalIF":2.6,"publicationDate":"2020-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320939943","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38186090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-30eCollection Date: 2020-01-01DOI: 10.1177/1176934320934498
Xin-Ke Zhan, Zhu-Hong You, Li-Ping Li, Yang Li, Zheng Wang, Jie Pan
Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of yeast, human, and Helicobacter pylori, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.
{"title":"Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence.","authors":"Xin-Ke Zhan, Zhu-Hong You, Li-Ping Li, Yang Li, Zheng Wang, Jie Pan","doi":"10.1177/1176934320934498","DOIUrl":"https://doi.org/10.1177/1176934320934498","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest-based classifiers are used to infer the protein interactions. When performed on PPI data sets of <i>yeast, human</i>, and <i>Helicobacter pylori</i>, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320934498"},"PeriodicalIF":2.6,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320934498","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38150704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-26eCollection Date: 2020-01-01DOI: 10.1177/1176934320936266
Iván Darío Ocampo-Ibáñez, Yamil Liscano, Sandra Patricia Rivera-Sánchez, José Oñate-Garzón, Ashley Dayan Lugo-Guevara, Liliana Janeth Flórez-Elvira, Maria Cristina Lesmes
Infections caused by multidrug-resistant (MDR) Pseudomonas aeruginosa and Klebsiella pneumoniae are a serious worldwide public health concern due to the ineffectiveness of empirical antibiotic therapy. Therefore, research and the development of new antibiotic alternatives are urgently needed to control these bacteria. The use of cationic antimicrobial peptides (CAMPs) is a promising candidate alternative therapeutic strategy to antibiotics because they exhibit antibacterial activity against both antibiotic susceptible and MDR strains. In this study, we aimed to investigate the in vitro antibacterial effect of a short synthetic CAMP derived from the ΔM2 analog of Cec D-like (CAMP-CecD) against clinical isolates of K pneumoniae (n = 30) and P aeruginosa (n = 30), as well as its hemolytic activity. Minimal inhibitory concentrations (MICs) and minimal bactericidal concentrations (MBCs) of CAMP-CecD against wild-type and MDR strains were determined by the broth microdilution test. In addition, an in silico molecular dynamic simulation was performed to predict the interaction between CAMP-CecD and membrane models of K pneumoniae and P aeruginosa. The results revealed a bactericidal effect of CAMP-CecD against both wild-type and resistant strains, but MDR P aeruginosa showed higher susceptibility to this peptide with MIC values between 32 and >256 μg/mL. CAMP-CecD showed higher stability in the P aeruginosa membrane model compared with the K pneumoniae model due to the greater number of noncovalent interactions with phospholipid 1-Palmitoyl-2-oleyl-sn-glycero-3-(phospho-rac-(1-glycerol)) (POPG). This may be related to the boosted effectiveness of the peptide against P aeruginosa clinical isolates. Given the antibacterial activity of CAMP-CecD against wild-type and MDR clinical isolates of P aeruginosa and K pneumoniae and its nonhemolytic effects on human erythrocytes, CAMP-CecD may be a promising alternative to conventional antibiotics.
{"title":"A Novel Cecropin D-Derived Short Cationic Antimicrobial Peptide Exhibits Antibacterial Activity Against Wild-Type and Multidrug-Resistant Strains of <i>Klebsiella pneumoniae</i> and <i>Pseudomonas aeruginosa</i>.","authors":"Iván Darío Ocampo-Ibáñez, Yamil Liscano, Sandra Patricia Rivera-Sánchez, José Oñate-Garzón, Ashley Dayan Lugo-Guevara, Liliana Janeth Flórez-Elvira, Maria Cristina Lesmes","doi":"10.1177/1176934320936266","DOIUrl":"https://doi.org/10.1177/1176934320936266","url":null,"abstract":"<p><p>Infections caused by multidrug-resistant (MDR) <i>Pseudomonas aeruginosa</i> and <i>Klebsiella pneumoniae</i> are a serious worldwide public health concern due to the ineffectiveness of empirical antibiotic therapy. Therefore, research and the development of new antibiotic alternatives are urgently needed to control these bacteria. The use of cationic antimicrobial peptides (CAMPs) is a promising candidate alternative therapeutic strategy to antibiotics because they exhibit antibacterial activity against both antibiotic susceptible and MDR strains. In this study, we aimed to investigate the in vitro antibacterial effect of a short synthetic CAMP derived from the ΔM2 analog of Cec D-like (CAMP-CecD) against clinical isolates of <i>K pneumoniae</i> (n = 30) and <i>P aeruginosa</i> (n = 30), as well as its hemolytic activity. Minimal inhibitory concentrations (MICs) and minimal bactericidal concentrations (MBCs) of CAMP-CecD against wild-type and MDR strains were determined by the broth microdilution test. In addition, an in silico molecular dynamic simulation was performed to predict the interaction between CAMP-CecD and membrane models of <i>K pneumoniae</i> and <i>P aeruginosa.</i> The results revealed a bactericidal effect of CAMP-CecD against both wild-type and resistant strains, but MDR <i>P aeruginosa</i> showed higher susceptibility to this peptide with MIC values between 32 and >256 μg/mL. CAMP-CecD showed higher stability in the <i>P aeruginosa</i> membrane model compared with the <i>K pneumoniae</i> model due to the greater number of noncovalent interactions with phospholipid 1-Palmitoyl-2-oleyl-sn-glycero-3-(phospho-rac-(1-glycerol)) (POPG). This may be related to the boosted effectiveness of the peptide against <i>P aeruginosa</i> clinical isolates. Given the antibacterial activity of CAMP-CecD against wild-type and MDR clinical isolates of <i>P aeruginosa</i> and <i>K pneumoniae</i> and its nonhemolytic effects on human erythrocytes, CAMP-CecD may be a promising alternative to conventional antibiotics.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320936266"},"PeriodicalIF":2.6,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320936266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38135430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-24eCollection Date: 2020-01-01DOI: 10.1177/1176934320920310
Amira Al-Aamri, Kamal Taha, Maher Maalouf, Andrzej Kudlicki, Dirar Homouz
Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.
{"title":"Inferring Causation in Yeast Gene Association Networks With Kernel Logistic Regression.","authors":"Amira Al-Aamri, Kamal Taha, Maher Maalouf, Andrzej Kudlicki, Dirar Homouz","doi":"10.1177/1176934320920310","DOIUrl":"https://doi.org/10.1177/1176934320920310","url":null,"abstract":"<p><p>Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole <i>Saccharomyces cerevisiae</i> genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320920310"},"PeriodicalIF":2.6,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320920310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39929540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-23eCollection Date: 2020-01-01DOI: 10.1177/1176934320908267
Xi Long, Hong Xue, J Tze-Fei Wong
The 3 biological domains delineated based on small subunit ribosomal RNAs (SSU rRNAs) are confronted by uncertainties regarding the relationship between Archaea and Bacteria, and the origin of Eukarya. The similarities between the paralogous valyl-tRNA and isoleucyl-tRNA synthetases in 5398 species estimated by BLASTP, which decreased from Archaea to Bacteria and further to Eukarya, were consistent with vertical gene transmission from an archaeal root of life close to Methanopyrus kandleri through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster, and to Eukarya. The predominant similarities of the ribosomal proteins (rProts) of eukaryotes toward archaeal rProts relative to bacterial rProts established that an archaeal parent rather than a bacterial parent underwent genome merger with bacteria to generate eukaryotes with mitochondria. Eukaryogenesis benefited from the predominantly archaeal accelerated gene adoption (AGA) phenotype pertaining to horizontally transferred genes from other prokaryotes and expedited genome evolution via both gene-content mutations and nucleotidyl mutations. Archaeons endowed with substantial AGA activity were accordingly favored as candidate archaeal parents. Based on the top similarity bitscores displayed by their proteomes toward the eukaryotic proteomes of Giardia and Trichomonas, and high AGA activity, the Aciduliprofundum archaea were identified as leading candidates of the archaeal parent. The Asgard archaeons and a number of bacterial species were among the foremost potential contributors of eukaryotic-like proteins to Eukarya.
{"title":"Descent of Bacteria and Eukarya From an Archaeal Root of Life.","authors":"Xi Long, Hong Xue, J Tze-Fei Wong","doi":"10.1177/1176934320908267","DOIUrl":"10.1177/1176934320908267","url":null,"abstract":"<p><p>The 3 biological domains delineated based on small subunit ribosomal RNAs (SSU rRNAs) are confronted by uncertainties regarding the relationship between Archaea and Bacteria, and the origin of Eukarya. The similarities between the paralogous valyl-tRNA and isoleucyl-tRNA synthetases in 5398 species estimated by BLASTP, which decreased from Archaea to Bacteria and further to Eukarya, were consistent with vertical gene transmission from an archaeal root of life close to <i>Methanopyrus kandleri</i> through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster, and to Eukarya. The predominant similarities of the ribosomal proteins (rProts) of eukaryotes toward archaeal rProts relative to bacterial rProts established that an archaeal parent rather than a bacterial parent underwent genome merger with bacteria to generate eukaryotes with mitochondria. Eukaryogenesis benefited from the predominantly archaeal <i>accelerated gene adoption</i> (AGA) phenotype pertaining to horizontally transferred genes from other prokaryotes and expedited genome evolution via both gene-content mutations and nucleotidyl mutations. Archaeons endowed with substantial AGA activity were accordingly favored as candidate archaeal parents. Based on the top similarity bitscores displayed by their proteomes toward the eukaryotic proteomes of <i>Giardia</i> and <i>Trichomonas</i>, and high AGA activity, the <i>Aciduliprofundum</i> archaea were identified as leading candidates of the archaeal parent. The <i>Asgard</i> archaeons and a number of bacterial species were among the foremost potential contributors of eukaryotic-like proteins to Eukarya.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320908267"},"PeriodicalIF":2.6,"publicationDate":"2020-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320908267","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38135429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-15eCollection Date: 2020-01-01DOI: 10.1177/1176934320930263
Junhui Wang, Nan Lu, Fei Yi, Yao Xiao
Transposable elements (TEs) are known to play a role in genome evolution, gene regulation, and epigenetics, representing potential tools for genetics research in and breeding of conifers. Recently, thanks to the development of high-throughput sequencing, more conifer genomes have been reported. Using bioinformatics tools, the TEs of 3 important conifers (Picea abies, Picea glauce, and Pinus taeda) were identified in our previous study, which provided a foundation for accelerating the use of TEs in conifer breeding and genetic study. Here, we review recent studies on the functional biology of TEs and discuss the potential applications for TEs in conifers.
{"title":"Identification of Transposable Elements in Conifer and Their Potential Application in Breeding.","authors":"Junhui Wang, Nan Lu, Fei Yi, Yao Xiao","doi":"10.1177/1176934320930263","DOIUrl":"https://doi.org/10.1177/1176934320930263","url":null,"abstract":"<p><p>Transposable elements (TEs) are known to play a role in genome evolution, gene regulation, and epigenetics, representing potential tools for genetics research in and breeding of conifers. Recently, thanks to the development of high-throughput sequencing, more conifer genomes have been reported. Using bioinformatics tools, the TEs of 3 important conifers (<i>Picea abies, Picea glauce</i>, and <i>Pinus taeda</i>) were identified in our previous study, which provided a foundation for accelerating the use of TEs in conifer breeding and genetic study. Here, we review recent studies on the functional biology of TEs and discuss the potential applications for TEs in conifers.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320930263"},"PeriodicalIF":2.6,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320930263","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38093234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}