Evolutionary Bioinformatics最新文献_第6页

Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest. 结合离散正弦变换和旋转森林的植物蛋白相互作用序列预测。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-10-12 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211050067

Jie Pan, Li-Ping Li, Chang-Qing Yu, Zhu-Hong You, Yong-Jian Guan, Zhong-Hao Ren

Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.

植物中蛋白质-蛋白质相互作用(PPIs)对于理解生物过程的调控至关重要。尽管高通量技术已被广泛用于识别ppi，但它们通常是费力的、昂贵的，并且存在高假阳性率。因此，开发新的计算方法作为检测植物中PPIs的补充工具是势在必行的。在这项工作中，我们提出了一种将集成学习分类器-旋转森林(RoF)与离散正弦变换(DST)相结合的方法，即DST-RoF来识别植物中的ppi。具体而言，首先将植物蛋白序列转换为位置特异性评分矩阵(PSSM)。然后，利用离散正弦变换提取有效特征，获取蛋白质的进化信息;最后，将这些最优特征输入到RoF分类器中进行训练和预测。在拟南芥、水稻和玉米等植物数据集上，DST-RoF的预测准确率分别为82.95%、88.82%和93.70%。为了进一步评估我们的方法的预测能力，我们将其与4种最先进的分类器和3种不同的特征提取方法进行了比较。综合实验结果表明，该方法对于植物蛋白相互作用对的预测具有可行性和鲁棒性。

{"title":"Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest.","authors":"Jie Pan, Li-Ping Li, Chang-Qing Yu, Zhu-Hong You, Yong-Jian Guan, Zhong-Hao Ren","doi":"10.1177/11769343211050067","DOIUrl":"https://doi.org/10.1177/11769343211050067","url":null,"abstract":"Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211050067"},"PeriodicalIF":2.6,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b4/46/10.1177_11769343211050067.PMC8521741.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39560690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Compelling Evidence Suggesting the Codon Usage of SARS-CoV-2 Adapts to Human After the Split From RaTG13. 令人信服的证据表明，SARS-CoV-2 的密码子用法在从 RaTG13 分裂后适应了人类。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-10-08 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211052013

Yanping Zhang, Xiaojie Jin, Haiyan Wang, Yaoyao Miao, Xiaoping Yang, Wenqing Jiang, Bin Yin

SARS-CoV-2 needs to efficiently make use of the resources from hosts in order to survive and propagate. Among the multiple layers of regulatory network, mRNA translation is the rate-limiting step in gene expression. Synonymous codon usage usually conforms with tRNA concentration to allow fast decoding during translation. It is acknowledged that SARS-CoV-2 has adapted to the codon usage of human lungs so that the virus could rapidly proliferate in the lung environment. While this notion seems to nicely explain the adaptation of SARS-CoV-2 to lungs, it is unable to tell why other viruses do not have this advantage. In this study, we retrieve the GTEx RNA-seq data for 30 tissues (belonging to over 17 000 individuals). We calculate the RSCU (relative synonymous codon usage) weighted by gene expression in each human sample, and investigate the correlation of RSCU between the human tissues and SARS-CoV-2 or RaTG13 (the closest coronavirus to SARS-CoV-2). Lung has the highest correlation of RSCU to SARS-CoV-2 among all tissues, suggesting that the lung environment is generally suitable for SARS-CoV-2. Interestingly, for most tissues, SARS-CoV-2 has higher correlations with the human samples compared with the RaTG13-human correlation. This difference is most significant for lungs. In conclusion, the codon usage of SARS-CoV-2 has adapted to human lungs to allow fast decoding and translation. This adaptation probably took place after SARS-CoV-2 split from RaTG13 because RaTG13 is less perfectly correlated with human. This finding depicts the trajectory of adaptive evolution from ancestral sequence to SARS-CoV-2, and also well explains why SARS-CoV-2 rather than other viruses could perfectly adapt to human lung environment.

SARS-CoV-2 需要有效利用宿主的资源才能生存和繁殖。在多层调控网络中，mRNA 翻译是基因表达的限速步骤。同义密码子的使用通常与 tRNA 的浓度一致，以便在翻译过程中快速解码。人们认为，SARS-CoV-2 已经适应了人类肺部的密码子用法，因此病毒可以在肺部环境中迅速增殖。虽然这一观点似乎很好地解释了 SARS-CoV-2 对肺部的适应，但却无法解释为什么其他病毒不具备这一优势。在本研究中，我们检索了 30 个组织（属于 17 000 多人）的 GTEx RNA-seq 数据。我们计算了每个人体样本基因表达加权的 RSCU（相对同义密码子使用），并研究了人体组织与 SARS-CoV-2 或 RaTG13（与 SARS-CoV-2 最接近的冠状病毒）之间 RSCU 的相关性。在所有组织中，肺部与 SARS-CoV-2 的 RSCU 相关性最高，这表明肺部环境通常适合 SARS-CoV-2 的生长。有趣的是，在大多数组织中，SARS-CoV-2 与人类样本的相关性高于 RaTG13 与人类的相关性。这种差异在肺部最为明显。总之，SARS-CoV-2 的密码子用法已经适应了人类肺部，可以快速解码和翻译。这种适应可能发生在 SARS-CoV-2 从 RaTG13 分裂出来之后，因为 RaTG13 与人类的相关性并不那么完美。这一发现描绘了从祖先序列到SARS-CoV-2的适应性进化轨迹，也很好地解释了为什么SARS-CoV-2而不是其他病毒能够完美地适应人类肺部环境。

{"title":"Compelling Evidence Suggesting the Codon Usage of SARS-CoV-2 Adapts to Human After the Split From RaTG13.","authors":"Yanping Zhang, Xiaojie Jin, Haiyan Wang, Yaoyao Miao, Xiaoping Yang, Wenqing Jiang, Bin Yin","doi":"10.1177/11769343211052013","DOIUrl":"10.1177/11769343211052013","url":null,"abstract":"SARS-CoV-2 needs to efficiently make use of the resources from hosts in order to survive and propagate. Among the multiple layers of regulatory network, mRNA translation is the rate-limiting step in gene expression. Synonymous codon usage usually conforms with tRNA concentration to allow fast decoding during translation. It is acknowledged that SARS-CoV-2 has adapted to the codon usage of human lungs so that the virus could rapidly proliferate in the lung environment. While this notion seems to nicely explain the adaptation of SARS-CoV-2 to lungs, it is unable to tell why other viruses do not have this advantage. In this study, we retrieve the GTEx RNA-seq data for 30 tissues (belonging to over 17 000 individuals). We calculate the RSCU (relative synonymous codon usage) weighted by gene expression in each human sample, and investigate the correlation of RSCU between the human tissues and SARS-CoV-2 or RaTG13 (the closest coronavirus to SARS-CoV-2). Lung has the highest correlation of RSCU to SARS-CoV-2 among all tissues, suggesting that the lung environment is generally suitable for SARS-CoV-2. Interestingly, for most tissues, SARS-CoV-2 has higher correlations with the human samples compared with the RaTG13-human correlation. This difference is most significant for lungs. In conclusion, the codon usage of SARS-CoV-2 has adapted to human lungs to allow fast decoding and translation. This adaptation probably took place after SARS-CoV-2 split from RaTG13 because RaTG13 is less perfectly correlated with human. This finding depicts the trajectory of adaptive evolution from ancestral sequence to SARS-CoV-2, and also well explains why SARS-CoV-2 rather than other viruses could perfectly adapt to human lung environment.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211052013"},"PeriodicalIF":2.6,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5c/4a/10.1177_11769343211052013.PMC8504689.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39518083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Biomarkers of Blood from Patients with Atherosclerosis Based on Bioinformatics Analysis. 基于生物信息学分析的动脉粥样硬化患者血液生物标志物研究。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-09-24 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211046020

Yongjiang Qian, Lili Zhang, Zhen Sun, Guangyao Zang, Yalan Li, Zhongqun Wang, Lihua Li

Atherosclerosis is a multifaceted disease characterized by the formation and accumulation of plaques that attach to arteries and cause cardiovascular disease and vascular embolism. A range of diagnostic techniques, including selective coronary angiography, stress tests, computerized tomography, and nuclear scans, assess cardiovascular disease risk and treatment targets. However, there is currently no simple blood biochemical index or biological target for the diagnosis of atherosclerosis. Therefore, it is of interest to find a biochemical blood marker for atherosclerosis. Three datasets from the Gene Expression Omnibus (GEO) database were analyzed to obtain differentially expressed genes (DEG) and the results were integrated using the Robustrankaggreg algorithm. The genes considered more critical by the Robustrankaggreg algorithm were put into their own data set and the data set system with cell classification information for verification. Twenty-one possible genes were screened out. Interestingly, we found a good correlation between RPS4Y1, EIF1AY, and XIST. In addition, we know the general expression of these genes in different cell types and whole blood cells. In this study, we identified BTNL8 and BLNK as having good clinical significance. These results will contribute to the analysis of the underlying genes involved in the progression of atherosclerosis and provide insights for the discovery of new diagnostic and evaluation methods.

动脉粥样硬化是一种多方面的疾病，其特征是斑块的形成和积聚，斑块附着在动脉上，导致心血管疾病和血管栓塞。一系列诊断技术，包括选择性冠状动脉造影、压力测试、计算机断层扫描和核扫描，评估心血管疾病的风险和治疗目标。然而，目前还没有简单的血液生化指标或生物学靶点来诊断动脉粥样硬化。因此，寻找一种动脉粥样硬化的血液生化标志物具有重要意义。通过分析基因表达综合数据库(Gene Expression Omnibus, GEO)中的3个数据集，获得差异表达基因(differential Expression genes, DEG)，并使用Robustrankaggreg算法对结果进行整合。将Robustrankaggreg算法认为较为关键的基因分别放入自己的数据集和具有细胞分类信息的数据集系统中进行验证。筛选出21个可能的基因。有趣的是，我们发现RPS4Y1、EIF1AY和XIST之间存在良好的相关性。此外，我们知道这些基因在不同细胞类型和全血细胞中的一般表达。在本研究中，我们发现BTNL8和BLNK具有良好的临床意义。这些结果将有助于分析参与动脉粥样硬化进展的潜在基因，并为发现新的诊断和评估方法提供见解。

{"title":"Biomarkers of Blood from Patients with Atherosclerosis Based on Bioinformatics Analysis.","authors":"Yongjiang Qian, Lili Zhang, Zhen Sun, Guangyao Zang, Yalan Li, Zhongqun Wang, Lihua Li","doi":"10.1177/11769343211046020","DOIUrl":"https://doi.org/10.1177/11769343211046020","url":null,"abstract":"Atherosclerosis is a multifaceted disease characterized by the formation and accumulation of plaques that attach to arteries and cause cardiovascular disease and vascular embolism. A range of diagnostic techniques, including selective coronary angiography, stress tests, computerized tomography, and nuclear scans, assess cardiovascular disease risk and treatment targets. However, there is currently no simple blood biochemical index or biological target for the diagnosis of atherosclerosis. Therefore, it is of interest to find a biochemical blood marker for atherosclerosis. Three datasets from the Gene Expression Omnibus (GEO) database were analyzed to obtain differentially expressed genes (DEG) and the results were integrated using the Robustrankaggreg algorithm. The genes considered more critical by the Robustrankaggreg algorithm were put into their own data set and the data set system with cell classification information for verification. Twenty-one possible genes were screened out. Interestingly, we found a good correlation between RPS4Y1, EIF1AY, and XIST. In addition, we know the general expression of these genes in different cell types and whole blood cells. In this study, we identified BTNL8 and BLNK as having good clinical significance. These results will contribute to the analysis of the underlying genes involved in the progression of atherosclerosis and provide insights for the discovery of new diagnostic and evaluation methods.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211046020"},"PeriodicalIF":2.6,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a0/be/10.1177_11769343211046020.PMC8477683.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39477141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Erratum to "On the matrix condition of phylogenetic tree". “关于系统发育树的矩阵条件”的勘误。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-09-09 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211046767

[This corrects the article DOI: 10.1177/1176934320901721.].

[这更正了文章DOI: 10.1177/1176934320901721.]。

引用次数: 0

Genome-Wide Phylogenetic Analysis, Expression Pattern, and Transcriptional Regulatory Network of the Pig C/EBP Gene Family. 猪C/EBP基因家族的全基因组系统发育分析、表达模式和转录调控网络。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-08-26 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211041382

Chaoxin Zhang, Tao Wang, Tongyan Cui, Shengwei Liu, Bing Zhang, Xue Li, Jian Tang, Peng Wang, Yuanyuan Guo, Zhipeng Wang

The CCAAT/enhancer binding protein (C/EBP) transcription factors (TFs) regulate many important biological processes, such as energy metabolism, inflammation, cell proliferation etc. A genome-wide gene identification revealed the presence of a total of 99 C/EBP genes in pig and 19 eukaryote genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into 6 subgroups named C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. Gene expression analysis showed that the C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ genes were expressed ubiquitously with inconsistent expression patterns in various pig tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs and miRNAs. A total of 27 feed-forward loop (FFL) motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to FFL sub-network were analyzed in 27 adult pig tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.

CCAAT/增强子结合蛋白（C/EBP）转录因子（TF）调节许多重要的生物学过程，如能量代谢、炎症、细胞增殖等。全基因组基因鉴定显示，猪和19个真核生物基因组中共存在99个C/EBP基因。系统发育分析表明，所有C/EBP TF可分为6个亚群，分别命名为C/EBPα、C/EBPβ、C/EBP-δ、C/EBP-ε、C/EBP/γ和C/EBPζ。基因表达分析表明，C/EBPα、C/EBPβ、C/EBP-δ、C/EBP-γ和C/EBP-ζ基因在各种猪组织中普遍表达，表达模式不一致。此外，构建了猪C/EBP调控网络，包括C/EBP基因、转录因子和miRNA。在猪C/EBP调控网络中总共检测到27个前馈环（FFL）基序。基于RNA-seq数据，分析了27个成年猪组织中与FFL亚网络相关的基因表达模式。某些FFL基序可能是组织特异性的。功能富集分析表明C/EBP及其靶基因参与了许多重要的生物学途径。这些结果提供了有价值的信息，阐明了C/EBP家族的进化关系，并有助于理解C/EBP基因的生物学功能。

{"title":"Genome-Wide Phylogenetic Analysis, Expression Pattern, and Transcriptional Regulatory Network of the Pig C/EBP Gene Family.","authors":"Chaoxin Zhang, Tao Wang, Tongyan Cui, Shengwei Liu, Bing Zhang, Xue Li, Jian Tang, Peng Wang, Yuanyuan Guo, Zhipeng Wang","doi":"10.1177/11769343211041382","DOIUrl":"10.1177/11769343211041382","url":null,"abstract":"The CCAAT/enhancer binding protein (C/EBP) transcription factors (TFs) regulate many important biological processes, such as energy metabolism, inflammation, cell proliferation etc. A genome-wide gene identification revealed the presence of a total of 99 C/EBP genes in pig and 19 eukaryote genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into 6 subgroups named C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. Gene expression analysis showed that the C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ genes were expressed ubiquitously with inconsistent expression patterns in various pig tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs and miRNAs. A total of 27 feed-forward loop (FFL) motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to FFL sub-network were analyzed in 27 adult pig tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211041382"},"PeriodicalIF":2.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/82/3d/10.1177_11769343211041382.PMC8404664.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39375403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Whole Genome Sequencing of Sunflower Root-Associated Bacillus cereus. 向日葵根相关蜡样芽孢杆菌全基因组测序。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-08-16 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211038948

Olubukola Oluranti Babalola, Bartholomew Saanu Adeleke, Ayansina Segun Ayangbenro

In recent times, diverse agriculturally important endophytic bacteria colonizing plant endosphere have been identified. Harnessing the potential of Bacillus species from sunflower could reveal their biotechnological and agricultural importance. Here, we present genomic insights into B. cereus T4S isolated from sunflower sourced from Lichtenburg, South Africa. Genome analysis revealed a sequence read count of 7 255 762, a genome size of 5 945 881 bp, and G + C content of 34.8%. The genome contains various protein-coding genes involved in various metabolic pathways. The detection of genes involved in the metabolism of organic substrates and chemotaxis could enhance plant-microbe interactions in the synthesis of biological products with biotechnological and agricultural importance.

近年来，已经鉴定出多种具有重要农业意义的定殖植物内球内生细菌。利用向日葵芽孢杆菌的潜力可以揭示其在生物技术和农业上的重要性。在这里，我们提出了从南非利希滕堡向日葵中分离到的蜡样芽孢杆菌T4S的基因组见解。基因组分析结果显示，该菌株序列读取数为7 255 762，基因组大小为5 945 881 bp, G + C含量为34.8%。基因组包含各种蛋白质编码基因，参与各种代谢途径。检测参与有机底物代谢和趋化性的基因可以增强植物与微生物在生物技术和农业生物制品合成中的相互作用。

引用次数: 8

Contrasting Patterns of Gene Duplication, Relocation, and Selection Among Human Taste Genes. 人类味觉基因中基因复制、重新定位和选择的对比模式

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-07-24 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211035141

Yupeng Wang, Ying Sun, Paule Valery Joseph

In humans, taste genes are responsible for perceiving at least 5 different taste qualities. Human taste genes' evolutionary mechanisms need to be explored. We compiled a list of 69 human taste-related genes and divided them into 7 functional groups. We carried out comparative genomic and evolutionary analyses for these taste genes based on 8 vertebrate species. We found that relative to other groups of human taste genes, human TAS2R genes have a higher proportion of tandem duplicates, suggesting that tandem duplications have contributed significantly to the expansion of the human TAS2R gene family. Human TAS2R genes tend to have fewer collinear genes in outgroup species and evolve faster, suggesting that human TAS2R genes have experienced more gene relocations. Moreover, human TAS2R genes tend to be under more relaxed purifying selection than other genes. Our study sheds new insights into diverse and contrasting evolutionary patterns among human taste genes.

在人类中，味觉基因负责感知至少5种不同的味觉品质。人类味觉基因的进化机制有待探索。我们编制了一份69个人类味觉相关基因的清单，并将它们分为7个功能组。我们对8种脊椎动物的味觉基因进行了比较基因组和进化分析。我们发现，相对于其他人类味觉基因群体，人类TAS2R基因具有更高的串联重复比例，这表明串联重复对人类TAS2R基因家族的扩展做出了重大贡献。人类TAS2R基因在群外物种中共线基因较少，进化速度较快，表明人类TAS2R基因经历了更多的基因重定位。此外，人类TAS2R基因比其他基因更容易受到更宽松的净化选择。我们的研究为人类味觉基因的多样化和对比的进化模式提供了新的见解。

引用次数: 2

The Evolution of G-quadruplex Structure in mRNA Untranslated Region. mRNA非翻译区g -四重体结构的演化。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-07-21 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211035140

Ting Qi, Yuming Xu, Tong Zhou, Wanjun Gu

The RNA G-quadruplex (rG4) is a kind of non-canonical high-order secondary structure with important biological functions and is enriched in untranslated regions (UTRs) of protein-coding genes. However, how rG4 structures evolve is largely unknown. Here, we systematically investigated the evolution of RNA sequences around UTR rG4 structures in 5 eukaryotic organisms. We found universal selection on UTR sequences, which facilitated rG4 formation in all the organisms that we analyzed. While G-rich sequences were preferred in the rG4 structural region, C-rich sequences were selectively not preferred. The selective pressure acting on rG4 structures in the UTRs of genes with higher G content was significantly smaller. Furthermore, we found that rG4 structures experienced smaller evolutionary selection near the translation initiation region in the 5' UTR, near the polyadenylation signals in the 3' UTR, and in regions flanking the miRNA targets in the 3' UTR. These results suggest universal selection for rG4 formation in the UTRs of eukaryotic genomes and the selection may be related to the biological functions of rG4s.

RNA g -四重体(rG4)是一类具有重要生物学功能的非规范高阶二级结构，富集于蛋白质编码基因的非翻译区(UTRs)。然而，rG4结构如何进化在很大程度上是未知的。在此，我们系统地研究了5种真核生物中UTR rG4结构周围RNA序列的进化。我们发现了UTR序列的普遍选择，这促进了rG4在我们分析的所有生物中的形成。在rG4结构区富g序列优先，富c序列选择性不优先。G含量高的基因UTRs中作用于rG4结构的选择压力明显较小。此外，我们发现rG4结构在5' UTR的翻译起始区附近、3' UTR的聚腺苷化信号附近以及3' UTR中miRNA靶标侧的区域经历了较小的进化选择。这些结果表明，rG4在真核生物基因组的utr中形成具有普遍的选择性，这种选择可能与rG4的生物学功能有关。

{"title":"The Evolution of G-quadruplex Structure in mRNA Untranslated Region.","authors":"Ting Qi, Yuming Xu, Tong Zhou, Wanjun Gu","doi":"10.1177/11769343211035140","DOIUrl":"https://doi.org/10.1177/11769343211035140","url":null,"abstract":"The RNA G-quadruplex (rG4) is a kind of non-canonical high-order secondary structure with important biological functions and is enriched in untranslated regions (UTRs) of protein-coding genes. However, how rG4 structures evolve is largely unknown. Here, we systematically investigated the evolution of RNA sequences around UTR rG4 structures in 5 eukaryotic organisms. We found universal selection on UTR sequences, which facilitated rG4 formation in all the organisms that we analyzed. While G-rich sequences were preferred in the rG4 structural region, C-rich sequences were selectively not preferred. The selective pressure acting on rG4 structures in the UTRs of genes with higher G content was significantly smaller. Furthermore, we found that rG4 structures experienced smaller evolutionary selection near the translation initiation region in the 5' UTR, near the polyadenylation signals in the 3' UTR, and in regions flanking the miRNA targets in the 3' UTR. These results suggest universal selection for rG4 formation in the UTRs of eukaryotic genomes and the selection may be related to the biological functions of rG4s.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211035140"},"PeriodicalIF":2.6,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/11769343211035140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39299984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Global Approach to Estimating the Abundance and Duplication of Polyketide Synthase Domains in Dinoflagellates. 估算双鞭毛藻中多酮合成酶结构域的丰度和复制的全球方法。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-07-14 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211031871

Ernest P Williams, Tsvetan R Bachvaroff, Allen R Place

Many dinoflagellate species make toxins in a myriad of different molecular configurations but the underlying chemistry in all cases is presumably via modular synthases, primarily polyketide synthases. In many organisms modular synthases occur as discrete synthetic genes or domains within a gene that act in coordination thus forming a module that produces a particular fragment of a natural product. The modules usually occur in tandem as gene clusters with a syntenic arrangement that is often predictive of the resultant structure. Dinoflagellate genomes however are notoriously complex with individual genes present in many tandem repeats and very few synthetic modules occurring as gene clusters, unlike what has been seen in bacteria and fungi. However, modular synthesis in all organisms requires a free thiol group that acts as a carrier for sequential synthesis called a thiolation domain. We scanned 47 dinoflagellate transcriptomes for 23 modular synthase domain models and compared their abundance among 10 orders of dinoflagellates as well as their co-occurrence with thiolation domains. The total count of domain types was quite large with over thirty-thousand identified, 29 000 of which were in the core dinoflagellates. Although there were no specific trends in domain abundance associated with types of toxins, there were readily observable lineage specific differences. The Gymnodiniales, makers of long polyketide toxins such as brevetoxin and karlotoxin had a high relative abundance of thiolation domains as well as multiple thiolation domains within a single transcript. Orders such as the Gonyaulacales, makers of small polyketides such as spirolides, had fewer thiolation domains but a relative increase in the number of acyl transferases. Unique to the core dinoflagellates, however, were thiolation domains occurring alongside tetratricopeptide repeats that facilitate protein-protein interactions, especially hexa and hepta-repeats, that may explain the scaffolding required for synthetic complexes capable of making large toxins. Clustering analysis for each type of domain was also used to discern possible origins of duplication for the multitude of single domain transcripts. Single domain transcripts frequently clustered with synonymous domains from multi-domain transcripts such as the BurA and ZmaK like genes as well as the multi-ketosynthase genes, sometimes with a large degree of apparent gene duplication, while fatty acid synthesis genes formed distinct clusters. Surprisingly the acyl-transferases and ketoreductases involved in fatty acid synthesis (FabD and FabG, respectively) were found in very large clusters indicating an unprecedented degree of gene duplication for these genes. These results demonstrate a complex evolutionary history of core dinoflagellate modular synthases with domain specific duplications throughout the lineage as well as clues to how large protein complexes can be assembled to synthesize the largest natural products kn

许多甲藻物种以各种不同的分子结构制造毒素，但所有情况下的基本化学反应都可能是通过模块合成酶（主要是多酮合成酶）进行的。在许多生物体中，模块合成酶是以离散的合成基因或基因内的结构域形式出现的，这些基因或结构域相互协调，从而形成一个模块，产生天然产物的特定片段。这些模块通常以基因簇的形式串联在一起，其同源排列通常可以预测最终的结构。然而，与细菌和真菌不同的是，甲藻的基因组是出了名的复杂，单个基因以许多串联重复的形式存在，很少有合成模块以基因簇的形式出现。然而，所有生物的模块合成都需要一个游离的硫醇基团作为载体，进行称为硫醇化结构域的连续合成。我们扫描了 47 个甲藻转录组，发现了 23 个模块化合成酶结构域模型，并比较了它们在 10 个甲藻纲中的丰度以及它们与硫代结构域的共存情况。经鉴定的结构域类型总数相当多，超过 3 万个，其中 29 000 个存在于核心甲藻中。虽然与毒素类型相关的结构域丰度并没有特定的趋势，但还是可以很容易地观察到特定品系的差异。制造长型多酮类毒素（如蒲公英毒素和卡洛托毒素）的裸鞭藻纲（Gymnodiniales）具有较高的硫醇化结构域相对丰度，并且在单个转录本中具有多个硫醇化结构域。制造螺环菌毒素等小型多酮化合物的 Gonyaulacales 目，其硫醇化结构域较少，但酰基转移酶的数量相对增加。不过，核心甲藻的独特之处在于硫醇化结构域与促进蛋白质间相互作用的四肽重复序列（尤其是六肽和七肽重复序列）同时出现，这可能解释了能够制造大型毒素的合成复合物所需的支架。我们还对每种结构域进行了聚类分析，以确定大量单结构域转录本可能的复制起源。单结构域转录本经常与来自多结构域转录本（如 BurA 和 ZmaK 类基因以及多酮合成酶基因）的同义结构域聚集在一起，有时存在大量明显的基因重复，而脂肪酸合成基因则形成了不同的聚集体。令人惊讶的是，参与脂肪酸合成的酰基转移酶和酮还原酶（分别为 FabD 和 FabG）形成了非常大的基因簇，表明这些基因的重复程度前所未有。这些结果表明了甲藻核心模块合成酶的复杂进化历史，其领域特异性复制贯穿整个品系，同时也为大型蛋白质复合物如何组装以合成已知最大的天然产物提供了线索。

{"title":"A Global Approach to Estimating the Abundance and Duplication of Polyketide Synthase Domains in Dinoflagellates.","authors":"Ernest P Williams, Tsvetan R Bachvaroff, Allen R Place","doi":"10.1177/11769343211031871","DOIUrl":"10.1177/11769343211031871","url":null,"abstract":"Many dinoflagellate species make toxins in a myriad of different molecular configurations but the underlying chemistry in all cases is presumably via modular synthases, primarily polyketide synthases. In many organisms modular synthases occur as discrete synthetic genes or domains within a gene that act in coordination thus forming a module that produces a particular fragment of a natural product. The modules usually occur in tandem as gene clusters with a syntenic arrangement that is often predictive of the resultant structure. Dinoflagellate genomes however are notoriously complex with individual genes present in many tandem repeats and very few synthetic modules occurring as gene clusters, unlike what has been seen in bacteria and fungi. However, modular synthesis in all organisms requires a free thiol group that acts as a carrier for sequential synthesis called a thiolation domain. We scanned 47 dinoflagellate transcriptomes for 23 modular synthase domain models and compared their abundance among 10 orders of dinoflagellates as well as their co-occurrence with thiolation domains. The total count of domain types was quite large with over thirty-thousand identified, 29 000 of which were in the core dinoflagellates. Although there were no specific trends in domain abundance associated with types of toxins, there were readily observable lineage specific differences. The Gymnodiniales, makers of long polyketide toxins such as brevetoxin and karlotoxin had a high relative abundance of thiolation domains as well as multiple thiolation domains within a single transcript. Orders such as the Gonyaulacales, makers of small polyketides such as spirolides, had fewer thiolation domains but a relative increase in the number of acyl transferases. Unique to the core dinoflagellates, however, were thiolation domains occurring alongside tetratricopeptide repeats that facilitate protein-protein interactions, especially hexa and hepta-repeats, that may explain the scaffolding required for synthetic complexes capable of making large toxins. Clustering analysis for each type of domain was also used to discern possible origins of duplication for the multitude of single domain transcripts. Single domain transcripts frequently clustered with synonymous domains from multi-domain transcripts such as the BurA and ZmaK like genes as well as the multi-ketosynthase genes, sometimes with a large degree of apparent gene duplication, while fatty acid synthesis genes formed distinct clusters. Surprisingly the acyl-transferases and ketoreductases involved in fatty acid synthesis (FabD and FabG, respectively) were found in very large clusters indicating an unprecedented degree of gene duplication for these genes. These results demonstrate a complex evolutionary history of core dinoflagellate modular synthases with domain specific duplications throughout the lineage as well as clues to how large protein complexes can be assembled to synthesize the largest natural products kn","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211031871"},"PeriodicalIF":2.6,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/13/aa/10.1177_11769343211031871.PMC8283056.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39281379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Key Genes and Pathways in Gefitinib-Resistant Lung Adenocarcinoma using Bioinformatics Analysis. 利用生物信息学分析鉴定吉非替尼耐药肺腺癌的关键基因和途径。

IF 2.6 4区生物学 Q4 EVOLUTIONARY BIOLOGY

Evolutionary Bioinformatics

Pub Date : 2021-06-11 eCollection Date: 2021-01-01 DOI: 10.1177/11769343211023767

Kailin Mao, Fang Lin, Yingai Zhang, Hailong Zhou

Gefitinib resistance is a serious threat in the treatment of patients with non-small cell lung cancer (NSCLC). Elucidating the underlying mechanisms and developing effective therapies to overcome gefitinib resistance is urgently needed. The differentially expressed genes (DEGs) were screened from the gene expression profile GSE122005 between gefitinib-sensitive and resistant samples. GO and KEGG analyses were performed with DAVID. The protein-protein interaction (PPI) network was established to visualize DEGs and screen hub genes. The functional roles of CCL20 in lung adenocarcinoma (LUAD) were examined using gene set enrichment analysis (GSEA). Functional analysis revealed that the DEGs were mainly concentrated in inflammatory, cell chemotaxis, and PI3K signal regulation. Ten hub genes were identified based on the PPI network. The survival analysis of the hub genes showed that CCL20 had a significant effect on the prognosis of LUAD patients. GSEA analysis showed that CCL20 high expression group was mainly enriched in cytokine-related signaling pathways. In conclusion, our analysis suggests that changes in inflammation and cytokine-related signaling pathways are closely related to gefitinib resistance in patients with lung cancer. The CCL20 gene may promote the formation of gefitinib resistance, which may serve as a new biomarker for predicting gefitinib resistance in patients with lung cancer.

吉非替尼耐药性是治疗癌症（NSCLC）患者的严重威胁。迫切需要阐明其潜在机制并开发有效的疗法来克服吉非替尼耐药性。从吉非替尼敏感和耐药样品之间的基因表达谱GSE122005中筛选差异表达基因（DEG）。采用DAVID进行GO和KEGG分析。建立了蛋白质-蛋白质相互作用（PPI）网络，以可视化DEG并筛选枢纽基因。应用基因集富集分析（GSEA）检测CCL20在肺腺癌（LUAD）中的功能作用。功能分析显示，DEG主要集中在炎症、细胞趋化性和PI3K信号调节方面。基于PPI网络鉴定了10个枢纽基因。hub基因的生存分析表明，CCL20对LUAD患者的预后有显著影响。GSEA分析显示CCL20高表达组主要富集于细胞因子相关的信号通路。总之，我们的分析表明，癌症患者炎症和细胞因子相关信号通路的变化与吉非替尼耐药性密切相关。CCL20基因可能促进吉非替尼耐药性的形成，这可能作为预测癌症患者吉非替宁耐药性的新生物标志物。

{"title":"Identification of Key Genes and Pathways in Gefitinib-Resistant Lung Adenocarcinoma using Bioinformatics Analysis.","authors":"Kailin Mao, Fang Lin, Yingai Zhang, Hailong Zhou","doi":"10.1177/11769343211023767","DOIUrl":"10.1177/11769343211023767","url":null,"abstract":"Gefitinib resistance is a serious threat in the treatment of patients with non-small cell lung cancer (NSCLC). Elucidating the underlying mechanisms and developing effective therapies to overcome gefitinib resistance is urgently needed. The differentially expressed genes (DEGs) were screened from the gene expression profile GSE122005 between gefitinib-sensitive and resistant samples. GO and KEGG analyses were performed with DAVID. The protein-protein interaction (PPI) network was established to visualize DEGs and screen hub genes. The functional roles of CCL20 in lung adenocarcinoma (LUAD) were examined using gene set enrichment analysis (GSEA). Functional analysis revealed that the DEGs were mainly concentrated in inflammatory, cell chemotaxis, and PI3K signal regulation. Ten hub genes were identified based on the PPI network. The survival analysis of the hub genes showed that CCL20 had a significant effect on the prognosis of LUAD patients. GSEA analysis showed that CCL20 high expression group was mainly enriched in cytokine-related signaling pathways. In conclusion, our analysis suggests that changes in inflammation and cytokine-related signaling pathways are closely related to gefitinib resistance in patients with lung cancer. The CCL20 gene may promote the formation of gefitinib resistance, which may serve as a new biomarker for predicting gefitinib resistance in patients with lung cancer.","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"17 ","pages":"11769343211023767"},"PeriodicalIF":2.6,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/11769343211023767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39112216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4