Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.
{"title":"Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest.","authors":"Jie Pan, Li-Ping Li, Chang-Qing Yu, Zhu-Hong You, Yong-Jian Guan, Zhong-Hao Ren","doi":"10.1177/11769343211050067","DOIUrl":"https://doi.org/10.1177/11769343211050067","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211050067"},"PeriodicalIF":2.6,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b4/46/10.1177_11769343211050067.PMC8521741.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39560690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SARS-CoV-2 needs to efficiently make use of the resources from hosts in order to survive and propagate. Among the multiple layers of regulatory network, mRNA translation is the rate-limiting step in gene expression. Synonymous codon usage usually conforms with tRNA concentration to allow fast decoding during translation. It is acknowledged that SARS-CoV-2 has adapted to the codon usage of human lungs so that the virus could rapidly proliferate in the lung environment. While this notion seems to nicely explain the adaptation of SARS-CoV-2 to lungs, it is unable to tell why other viruses do not have this advantage. In this study, we retrieve the GTEx RNA-seq data for 30 tissues (belonging to over 17 000 individuals). We calculate the RSCU (relative synonymous codon usage) weighted by gene expression in each human sample, and investigate the correlation of RSCU between the human tissues and SARS-CoV-2 or RaTG13 (the closest coronavirus to SARS-CoV-2). Lung has the highest correlation of RSCU to SARS-CoV-2 among all tissues, suggesting that the lung environment is generally suitable for SARS-CoV-2. Interestingly, for most tissues, SARS-CoV-2 has higher correlations with the human samples compared with the RaTG13-human correlation. This difference is most significant for lungs. In conclusion, the codon usage of SARS-CoV-2 has adapted to human lungs to allow fast decoding and translation. This adaptation probably took place after SARS-CoV-2 split from RaTG13 because RaTG13 is less perfectly correlated with human. This finding depicts the trajectory of adaptive evolution from ancestral sequence to SARS-CoV-2, and also well explains why SARS-CoV-2 rather than other viruses could perfectly adapt to human lung environment.
{"title":"Compelling Evidence Suggesting the Codon Usage of SARS-CoV-2 Adapts to Human After the Split From RaTG13.","authors":"Yanping Zhang, Xiaojie Jin, Haiyan Wang, Yaoyao Miao, Xiaoping Yang, Wenqing Jiang, Bin Yin","doi":"10.1177/11769343211052013","DOIUrl":"10.1177/11769343211052013","url":null,"abstract":"<p><p>SARS-CoV-2 needs to efficiently make use of the resources from hosts in order to survive and propagate. Among the multiple layers of regulatory network, mRNA translation is the rate-limiting step in gene expression. Synonymous codon usage usually conforms with tRNA concentration to allow fast decoding during translation. It is acknowledged that SARS-CoV-2 has adapted to the codon usage of human lungs so that the virus could rapidly proliferate in the lung environment. While this notion seems to nicely explain the adaptation of SARS-CoV-2 to lungs, it is unable to tell why other viruses do not have this advantage. In this study, we retrieve the GTEx RNA-seq data for 30 tissues (belonging to over 17 000 individuals). We calculate the RSCU (relative synonymous codon usage) weighted by gene expression in each human sample, and investigate the correlation of RSCU between the human tissues and SARS-CoV-2 or RaTG13 (the closest coronavirus to SARS-CoV-2). Lung has the highest correlation of RSCU to SARS-CoV-2 among all tissues, suggesting that the lung environment is generally suitable for SARS-CoV-2. Interestingly, for most tissues, SARS-CoV-2 has higher correlations with the human samples compared with the RaTG13-human correlation. This difference is most significant for lungs. In conclusion, the codon usage of SARS-CoV-2 has adapted to human lungs to allow fast decoding and translation. This adaptation probably took place after SARS-CoV-2 split from RaTG13 because RaTG13 is less perfectly correlated with human. This finding depicts the trajectory of adaptive evolution from ancestral sequence to SARS-CoV-2, and also well explains why SARS-CoV-2 rather than other viruses could perfectly adapt to human lung environment.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211052013"},"PeriodicalIF":2.6,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5c/4a/10.1177_11769343211052013.PMC8504689.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39518083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-24eCollection Date: 2021-01-01DOI: 10.1177/11769343211046020
Yongjiang Qian, Lili Zhang, Zhen Sun, Guangyao Zang, Yalan Li, Zhongqun Wang, Lihua Li
Atherosclerosis is a multifaceted disease characterized by the formation and accumulation of plaques that attach to arteries and cause cardiovascular disease and vascular embolism. A range of diagnostic techniques, including selective coronary angiography, stress tests, computerized tomography, and nuclear scans, assess cardiovascular disease risk and treatment targets. However, there is currently no simple blood biochemical index or biological target for the diagnosis of atherosclerosis. Therefore, it is of interest to find a biochemical blood marker for atherosclerosis. Three datasets from the Gene Expression Omnibus (GEO) database were analyzed to obtain differentially expressed genes (DEG) and the results were integrated using the Robustrankaggreg algorithm. The genes considered more critical by the Robustrankaggreg algorithm were put into their own data set and the data set system with cell classification information for verification. Twenty-one possible genes were screened out. Interestingly, we found a good correlation between RPS4Y1, EIF1AY, and XIST. In addition, we know the general expression of these genes in different cell types and whole blood cells. In this study, we identified BTNL8 and BLNK as having good clinical significance. These results will contribute to the analysis of the underlying genes involved in the progression of atherosclerosis and provide insights for the discovery of new diagnostic and evaluation methods.
{"title":"Biomarkers of Blood from Patients with Atherosclerosis Based on Bioinformatics Analysis.","authors":"Yongjiang Qian, Lili Zhang, Zhen Sun, Guangyao Zang, Yalan Li, Zhongqun Wang, Lihua Li","doi":"10.1177/11769343211046020","DOIUrl":"https://doi.org/10.1177/11769343211046020","url":null,"abstract":"<p><p>Atherosclerosis is a multifaceted disease characterized by the formation and accumulation of plaques that attach to arteries and cause cardiovascular disease and vascular embolism. A range of diagnostic techniques, including selective coronary angiography, stress tests, computerized tomography, and nuclear scans, assess cardiovascular disease risk and treatment targets. However, there is currently no simple blood biochemical index or biological target for the diagnosis of atherosclerosis. Therefore, it is of interest to find a biochemical blood marker for atherosclerosis. Three datasets from the Gene Expression Omnibus (GEO) database were analyzed to obtain differentially expressed genes (DEG) and the results were integrated using the Robustrankaggreg algorithm. The genes considered more critical by the Robustrankaggreg algorithm were put into their own data set and the data set system with cell classification information for verification. Twenty-one possible genes were screened out. Interestingly, we found a good correlation between <i>RPS4Y1</i>, <i>EIF1AY</i>, and <i>XIST</i>. In addition, we know the general expression of these genes in different cell types and whole blood cells. In this study, we identified <i>BTNL8</i> and <i>BLNK</i> as having good clinical significance. These results will contribute to the analysis of the underlying genes involved in the progression of atherosclerosis and provide insights for the discovery of new diagnostic and evaluation methods.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211046020"},"PeriodicalIF":2.6,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a0/be/10.1177_11769343211046020.PMC8477683.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39477141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-09eCollection Date: 2021-01-01DOI: 10.1177/11769343211046767
[This corrects the article DOI: 10.1177/1176934320901721.].
[这更正了文章DOI: 10.1177/1176934320901721.]。
{"title":"Erratum to \"On the matrix condition of phylogenetic tree\".","authors":"","doi":"10.1177/11769343211046767","DOIUrl":"https://doi.org/10.1177/11769343211046767","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1177/1176934320901721.].</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211046767"},"PeriodicalIF":2.6,"publicationDate":"2021-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/36/d0/10.1177_11769343211046767.PMC8436297.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39421044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-26eCollection Date: 2021-01-01DOI: 10.1177/11769343211041382
Chaoxin Zhang, Tao Wang, Tongyan Cui, Shengwei Liu, Bing Zhang, Xue Li, Jian Tang, Peng Wang, Yuanyuan Guo, Zhipeng Wang
The CCAAT/enhancer binding protein (C/EBP) transcription factors (TFs) regulate many important biological processes, such as energy metabolism, inflammation, cell proliferation etc. A genome-wide gene identification revealed the presence of a total of 99 C/EBP genes in pig and 19 eukaryote genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into 6 subgroups named C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. Gene expression analysis showed that the C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ genes were expressed ubiquitously with inconsistent expression patterns in various pig tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs and miRNAs. A total of 27 feed-forward loop (FFL) motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to FFL sub-network were analyzed in 27 adult pig tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.
{"title":"Genome-Wide Phylogenetic Analysis, Expression Pattern, and Transcriptional Regulatory Network of the Pig C/EBP Gene Family.","authors":"Chaoxin Zhang, Tao Wang, Tongyan Cui, Shengwei Liu, Bing Zhang, Xue Li, Jian Tang, Peng Wang, Yuanyuan Guo, Zhipeng Wang","doi":"10.1177/11769343211041382","DOIUrl":"10.1177/11769343211041382","url":null,"abstract":"<p><p>The CCAAT/enhancer binding protein (C/EBP) transcription factors (TFs) regulate many important biological processes, such as energy metabolism, inflammation, cell proliferation etc. A genome-wide gene identification revealed the presence of a total of 99 C/EBP genes in pig and 19 eukaryote genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into 6 subgroups named <i>C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ</i>, and <i>C/EBPζ</i>. Gene expression analysis showed that the <i>C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ</i>, and <i>C/EBPζ</i> genes were expressed ubiquitously with inconsistent expression patterns in various pig tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs and miRNAs. A total of 27 feed-forward loop (FFL) motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to FFL sub-network were analyzed in 27 adult pig tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211041382"},"PeriodicalIF":2.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/82/3d/10.1177_11769343211041382.PMC8404664.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39375403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-16eCollection Date: 2021-01-01DOI: 10.1177/11769343211038948
Olubukola Oluranti Babalola, Bartholomew Saanu Adeleke, Ayansina Segun Ayangbenro
In recent times, diverse agriculturally important endophytic bacteria colonizing plant endosphere have been identified. Harnessing the potential of Bacillus species from sunflower could reveal their biotechnological and agricultural importance. Here, we present genomic insights into B. cereus T4S isolated from sunflower sourced from Lichtenburg, South Africa. Genome analysis revealed a sequence read count of 7 255 762, a genome size of 5 945 881 bp, and G + C content of 34.8%. The genome contains various protein-coding genes involved in various metabolic pathways. The detection of genes involved in the metabolism of organic substrates and chemotaxis could enhance plant-microbe interactions in the synthesis of biological products with biotechnological and agricultural importance.
近年来,已经鉴定出多种具有重要农业意义的定殖植物内球内生细菌。利用向日葵芽孢杆菌的潜力可以揭示其在生物技术和农业上的重要性。在这里,我们提出了从南非利希滕堡向日葵中分离到的蜡样芽孢杆菌T4S的基因组见解。基因组分析结果显示,该菌株序列读取数为7 255 762,基因组大小为5 945 881 bp, G + C含量为34.8%。基因组包含各种蛋白质编码基因,参与各种代谢途径。检测参与有机底物代谢和趋化性的基因可以增强植物与微生物在生物技术和农业生物制品合成中的相互作用。
{"title":"Whole Genome Sequencing of Sunflower Root-Associated <i>Bacillus cereus</i>.","authors":"Olubukola Oluranti Babalola, Bartholomew Saanu Adeleke, Ayansina Segun Ayangbenro","doi":"10.1177/11769343211038948","DOIUrl":"https://doi.org/10.1177/11769343211038948","url":null,"abstract":"<p><p>In recent times, diverse agriculturally important endophytic bacteria colonizing plant endosphere have been identified. Harnessing the potential of <i>Bacillus</i> species from sunflower could reveal their biotechnological and agricultural importance. Here, we present genomic insights into <i>B. cereus</i> T4S isolated from sunflower sourced from Lichtenburg, South Africa. Genome analysis revealed a sequence read count of 7 255 762, a genome size of 5 945 881 bp, and G + C content of 34.8%. The genome contains various protein-coding genes involved in various metabolic pathways. The detection of genes involved in the metabolism of organic substrates and chemotaxis could enhance plant-microbe interactions in the synthesis of biological products with biotechnological and agricultural importance.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211038948"},"PeriodicalIF":2.6,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ca/16/10.1177_11769343211038948.PMC8375328.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39334116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-24eCollection Date: 2021-01-01DOI: 10.1177/11769343211035141
Yupeng Wang, Ying Sun, Paule Valery Joseph
In humans, taste genes are responsible for perceiving at least 5 different taste qualities. Human taste genes' evolutionary mechanisms need to be explored. We compiled a list of 69 human taste-related genes and divided them into 7 functional groups. We carried out comparative genomic and evolutionary analyses for these taste genes based on 8 vertebrate species. We found that relative to other groups of human taste genes, human TAS2R genes have a higher proportion of tandem duplicates, suggesting that tandem duplications have contributed significantly to the expansion of the human TAS2R gene family. Human TAS2R genes tend to have fewer collinear genes in outgroup species and evolve faster, suggesting that human TAS2R genes have experienced more gene relocations. Moreover, human TAS2R genes tend to be under more relaxed purifying selection than other genes. Our study sheds new insights into diverse and contrasting evolutionary patterns among human taste genes.
{"title":"Contrasting Patterns of Gene Duplication, Relocation, and Selection Among Human Taste Genes.","authors":"Yupeng Wang, Ying Sun, Paule Valery Joseph","doi":"10.1177/11769343211035141","DOIUrl":"https://doi.org/10.1177/11769343211035141","url":null,"abstract":"<p><p>In humans, taste genes are responsible for perceiving at least 5 different taste qualities. Human taste genes' evolutionary mechanisms need to be explored. We compiled a list of 69 human taste-related genes and divided them into 7 functional groups. We carried out comparative genomic and evolutionary analyses for these taste genes based on 8 vertebrate species. We found that relative to other groups of human taste genes, human TAS2R genes have a higher proportion of tandem duplicates, suggesting that tandem duplications have contributed significantly to the expansion of the human TAS2R gene family. Human TAS2R genes tend to have fewer collinear genes in outgroup species and evolve faster, suggesting that human TAS2R genes have experienced more gene relocations. Moreover, human TAS2R genes tend to be under more relaxed purifying selection than other genes. Our study sheds new insights into diverse and contrasting evolutionary patterns among human taste genes.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211035141"},"PeriodicalIF":2.6,"publicationDate":"2021-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/11769343211035141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39299985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-21eCollection Date: 2021-01-01DOI: 10.1177/11769343211035140
Ting Qi, Yuming Xu, Tong Zhou, Wanjun Gu
The RNA G-quadruplex (rG4) is a kind of non-canonical high-order secondary structure with important biological functions and is enriched in untranslated regions (UTRs) of protein-coding genes. However, how rG4 structures evolve is largely unknown. Here, we systematically investigated the evolution of RNA sequences around UTR rG4 structures in 5 eukaryotic organisms. We found universal selection on UTR sequences, which facilitated rG4 formation in all the organisms that we analyzed. While G-rich sequences were preferred in the rG4 structural region, C-rich sequences were selectively not preferred. The selective pressure acting on rG4 structures in the UTRs of genes with higher G content was significantly smaller. Furthermore, we found that rG4 structures experienced smaller evolutionary selection near the translation initiation region in the 5' UTR, near the polyadenylation signals in the 3' UTR, and in regions flanking the miRNA targets in the 3' UTR. These results suggest universal selection for rG4 formation in the UTRs of eukaryotic genomes and the selection may be related to the biological functions of rG4s.
RNA g -四重体(rG4)是一类具有重要生物学功能的非规范高阶二级结构,富集于蛋白质编码基因的非翻译区(UTRs)。然而,rG4结构如何进化在很大程度上是未知的。在此,我们系统地研究了5种真核生物中UTR rG4结构周围RNA序列的进化。我们发现了UTR序列的普遍选择,这促进了rG4在我们分析的所有生物中的形成。在rG4结构区富g序列优先,富c序列选择性不优先。G含量高的基因UTRs中作用于rG4结构的选择压力明显较小。此外,我们发现rG4结构在5' UTR的翻译起始区附近、3' UTR的聚腺苷化信号附近以及3' UTR中miRNA靶标侧的区域经历了较小的进化选择。这些结果表明,rG4在真核生物基因组的utr中形成具有普遍的选择性,这种选择可能与rG4的生物学功能有关。
{"title":"The Evolution of G-quadruplex Structure in mRNA Untranslated Region.","authors":"Ting Qi, Yuming Xu, Tong Zhou, Wanjun Gu","doi":"10.1177/11769343211035140","DOIUrl":"https://doi.org/10.1177/11769343211035140","url":null,"abstract":"<p><p>The RNA G-quadruplex (rG4) is a kind of non-canonical high-order secondary structure with important biological functions and is enriched in untranslated regions (UTRs) of protein-coding genes. However, how rG4 structures evolve is largely unknown. Here, we systematically investigated the evolution of RNA sequences around UTR rG4 structures in 5 eukaryotic organisms. We found universal selection on UTR sequences, which facilitated rG4 formation in all the organisms that we analyzed. While <i>G</i>-rich sequences were preferred in the rG4 structural region, <i>C</i>-rich sequences were selectively not preferred. The selective pressure acting on rG4 structures in the UTRs of genes with higher <i>G</i> content was significantly smaller. Furthermore, we found that rG4 structures experienced smaller evolutionary selection near the translation initiation region in the 5' UTR, near the polyadenylation signals in the 3' UTR, and in regions flanking the miRNA targets in the 3' UTR. These results suggest universal selection for rG4 formation in the UTRs of eukaryotic genomes and the selection may be related to the biological functions of rG4s.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211035140"},"PeriodicalIF":2.6,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/11769343211035140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39299984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-14eCollection Date: 2021-01-01DOI: 10.1177/11769343211031871
Ernest P Williams, Tsvetan R Bachvaroff, Allen R Place
<p><p>Many dinoflagellate species make toxins in a myriad of different molecular configurations but the underlying chemistry in all cases is presumably via modular synthases, primarily polyketide synthases. In many organisms modular synthases occur as discrete synthetic genes or domains within a gene that act in coordination thus forming a module that produces a particular fragment of a natural product. The modules usually occur in tandem as gene clusters with a syntenic arrangement that is often predictive of the resultant structure. Dinoflagellate genomes however are notoriously complex with individual genes present in many tandem repeats and very few synthetic modules occurring as gene clusters, unlike what has been seen in bacteria and fungi. However, modular synthesis in all organisms requires a free thiol group that acts as a carrier for sequential synthesis called a thiolation domain. We scanned 47 dinoflagellate transcriptomes for 23 modular synthase domain models and compared their abundance among 10 orders of dinoflagellates as well as their co-occurrence with thiolation domains. The total count of domain types was quite large with over thirty-thousand identified, 29 000 of which were in the core dinoflagellates. Although there were no specific trends in domain abundance associated with types of toxins, there were readily observable lineage specific differences. The Gymnodiniales, makers of long polyketide toxins such as brevetoxin and karlotoxin had a high relative abundance of thiolation domains as well as multiple thiolation domains within a single transcript. Orders such as the Gonyaulacales, makers of small polyketides such as spirolides, had fewer thiolation domains but a relative increase in the number of acyl transferases. Unique to the core dinoflagellates, however, were thiolation domains occurring alongside tetratricopeptide repeats that facilitate protein-protein interactions, especially hexa and hepta-repeats, that may explain the scaffolding required for synthetic complexes capable of making large toxins. Clustering analysis for each type of domain was also used to discern possible origins of duplication for the multitude of single domain transcripts. Single domain transcripts frequently clustered with synonymous domains from multi-domain transcripts such as the BurA and ZmaK like genes as well as the multi-ketosynthase genes, sometimes with a large degree of apparent gene duplication, while fatty acid synthesis genes formed distinct clusters. Surprisingly the acyl-transferases and ketoreductases involved in fatty acid synthesis (FabD and FabG, respectively) were found in very large clusters indicating an unprecedented degree of gene duplication for these genes. These results demonstrate a complex evolutionary history of core dinoflagellate modular synthases with domain specific duplications throughout the lineage as well as clues to how large protein complexes can be assembled to synthesize the largest natural products kn
{"title":"A Global Approach to Estimating the Abundance and Duplication of Polyketide Synthase Domains in Dinoflagellates.","authors":"Ernest P Williams, Tsvetan R Bachvaroff, Allen R Place","doi":"10.1177/11769343211031871","DOIUrl":"10.1177/11769343211031871","url":null,"abstract":"<p><p>Many dinoflagellate species make toxins in a myriad of different molecular configurations but the underlying chemistry in all cases is presumably via modular synthases, primarily polyketide synthases. In many organisms modular synthases occur as discrete synthetic genes or domains within a gene that act in coordination thus forming a module that produces a particular fragment of a natural product. The modules usually occur in tandem as gene clusters with a syntenic arrangement that is often predictive of the resultant structure. Dinoflagellate genomes however are notoriously complex with individual genes present in many tandem repeats and very few synthetic modules occurring as gene clusters, unlike what has been seen in bacteria and fungi. However, modular synthesis in all organisms requires a free thiol group that acts as a carrier for sequential synthesis called a thiolation domain. We scanned 47 dinoflagellate transcriptomes for 23 modular synthase domain models and compared their abundance among 10 orders of dinoflagellates as well as their co-occurrence with thiolation domains. The total count of domain types was quite large with over thirty-thousand identified, 29 000 of which were in the core dinoflagellates. Although there were no specific trends in domain abundance associated with types of toxins, there were readily observable lineage specific differences. The Gymnodiniales, makers of long polyketide toxins such as brevetoxin and karlotoxin had a high relative abundance of thiolation domains as well as multiple thiolation domains within a single transcript. Orders such as the Gonyaulacales, makers of small polyketides such as spirolides, had fewer thiolation domains but a relative increase in the number of acyl transferases. Unique to the core dinoflagellates, however, were thiolation domains occurring alongside tetratricopeptide repeats that facilitate protein-protein interactions, especially hexa and hepta-repeats, that may explain the scaffolding required for synthetic complexes capable of making large toxins. Clustering analysis for each type of domain was also used to discern possible origins of duplication for the multitude of single domain transcripts. Single domain transcripts frequently clustered with synonymous domains from multi-domain transcripts such as the BurA and ZmaK like genes as well as the multi-ketosynthase genes, sometimes with a large degree of apparent gene duplication, while fatty acid synthesis genes formed distinct clusters. Surprisingly the acyl-transferases and ketoreductases involved in fatty acid synthesis (FabD and FabG, respectively) were found in very large clusters indicating an unprecedented degree of gene duplication for these genes. These results demonstrate a complex evolutionary history of core dinoflagellate modular synthases with domain specific duplications throughout the lineage as well as clues to how large protein complexes can be assembled to synthesize the largest natural products kn","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211031871"},"PeriodicalIF":2.6,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/13/aa/10.1177_11769343211031871.PMC8283056.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39281379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-11eCollection Date: 2021-01-01DOI: 10.1177/11769343211023767
Kailin Mao, Fang Lin, Yingai Zhang, Hailong Zhou
Gefitinib resistance is a serious threat in the treatment of patients with non-small cell lung cancer (NSCLC). Elucidating the underlying mechanisms and developing effective therapies to overcome gefitinib resistance is urgently needed. The differentially expressed genes (DEGs) were screened from the gene expression profile GSE122005 between gefitinib-sensitive and resistant samples. GO and KEGG analyses were performed with DAVID. The protein-protein interaction (PPI) network was established to visualize DEGs and screen hub genes. The functional roles of CCL20 in lung adenocarcinoma (LUAD) were examined using gene set enrichment analysis (GSEA). Functional analysis revealed that the DEGs were mainly concentrated in inflammatory, cell chemotaxis, and PI3K signal regulation. Ten hub genes were identified based on the PPI network. The survival analysis of the hub genes showed that CCL20 had a significant effect on the prognosis of LUAD patients. GSEA analysis showed that CCL20 high expression group was mainly enriched in cytokine-related signaling pathways. In conclusion, our analysis suggests that changes in inflammation and cytokine-related signaling pathways are closely related to gefitinib resistance in patients with lung cancer. The CCL20 gene may promote the formation of gefitinib resistance, which may serve as a new biomarker for predicting gefitinib resistance in patients with lung cancer.
{"title":"Identification of Key Genes and Pathways in Gefitinib-Resistant Lung Adenocarcinoma using Bioinformatics Analysis.","authors":"Kailin Mao, Fang Lin, Yingai Zhang, Hailong Zhou","doi":"10.1177/11769343211023767","DOIUrl":"10.1177/11769343211023767","url":null,"abstract":"<p><p>Gefitinib resistance is a serious threat in the treatment of patients with non-small cell lung cancer (NSCLC). Elucidating the underlying mechanisms and developing effective therapies to overcome gefitinib resistance is urgently needed. The differentially expressed genes (DEGs) were screened from the gene expression profile GSE122005 between gefitinib-sensitive and resistant samples. GO and KEGG analyses were performed with DAVID. The protein-protein interaction (PPI) network was established to visualize DEGs and screen hub genes. The functional roles of CCL20 in lung adenocarcinoma (LUAD) were examined using gene set enrichment analysis (GSEA). Functional analysis revealed that the DEGs were mainly concentrated in inflammatory, cell chemotaxis, and PI3K signal regulation. Ten hub genes were identified based on the PPI network. The survival analysis of the hub genes showed that CCL20 had a significant effect on the prognosis of LUAD patients. GSEA analysis showed that CCL20 high expression group was mainly enriched in cytokine-related signaling pathways. In conclusion, our analysis suggests that changes in inflammation and cytokine-related signaling pathways are closely related to gefitinib resistance in patients with lung cancer. The CCL20 gene may promote the formation of gefitinib resistance, which may serve as a new biomarker for predicting gefitinib resistance in patients with lung cancer.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211023767"},"PeriodicalIF":2.6,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/11769343211023767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39112216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}