Pub Date : 2020-10-10eCollection Date: 2020-01-01DOI: 10.1177/1176934320948848
Yang Sun, Lianwei Li, Aiyun Lai, Wanmeng Xiao, Kunhua Wang, Lan Wang, Junkun Niu, Juan Luo, Hongju Chen, Lin Dai, Yinglei Miao
The dysbiosis of the gut microbiome associated with ulcerative colitis (UC) has been extensively studied in recent years. However, the question of whether UC influences the spatial heterogeneity of the human gut mucosal microbiome has not been addressed. Spatial heterogeneity (specifically, the inter-individual heterogeneity in microbial species abundances) is one of the most important characterizations at both population and community scales, and can be assessed and interpreted by Taylor's power law (TPL) and its community-scale extensions (TPLEs). Due to the high mobility of microbes, it is difficult to investigate their spatial heterogeneity explicitly; however, TPLE offers an effective approach to implicitly analyze the microbial communities. Here, we investigated the influence of UC on the spatial heterogeneity of the gut microbiome with intestinal mucosal microbiome samples collected from 28 UC patients and healthy controls. Specifically, we applied Type-I TPLE for measuring community spatial heterogeneity and Type-III TPLE for measuring mixed-species population heterogeneity to evaluate the heterogeneity changes of the mucosal microbiome induced by UC at both the community and species scales. We further used permutation test to determine the possible differences between UC patients and healthy controls in heterogeneity scaling parameters. Results showed that UC did not significantly influence gut mucosal microbiome heterogeneity at either the community or mixed-species levels. These findings demonstrated significant resilience of the human gut microbiome and confirmed a prediction of TPLE: that the inter-subject heterogeneity scaling parameter of the gut microbiome is an intrinsic property to humans, invariant with UC disease.
{"title":"Does Ulcerative Colitis Influence the Inter-individual Heterogeneity of the Human Intestinal Mucosal Microbiome?","authors":"Yang Sun, Lianwei Li, Aiyun Lai, Wanmeng Xiao, Kunhua Wang, Lan Wang, Junkun Niu, Juan Luo, Hongju Chen, Lin Dai, Yinglei Miao","doi":"10.1177/1176934320948848","DOIUrl":"https://doi.org/10.1177/1176934320948848","url":null,"abstract":"<p><p>The dysbiosis of the gut microbiome associated with ulcerative colitis (UC) has been extensively studied in recent years. However, the question of whether UC influences the spatial heterogeneity of the human gut mucosal microbiome has not been addressed. Spatial heterogeneity (specifically, the inter-individual heterogeneity in microbial species abundances) is one of the most important characterizations at both population and community scales, and can be assessed and interpreted by Taylor's power law (TPL) and its community-scale extensions (TPLEs). Due to the high mobility of microbes, it is difficult to investigate their spatial heterogeneity explicitly; however, TPLE offers an effective approach to implicitly analyze the microbial communities. Here, we investigated the influence of UC on the spatial heterogeneity of the gut microbiome with intestinal mucosal microbiome samples collected from 28 UC patients and healthy controls. Specifically, we applied Type-I TPLE for measuring community spatial heterogeneity and Type-III TPLE for measuring mixed-species population heterogeneity to evaluate the heterogeneity changes of the mucosal microbiome induced by UC at both the community and species scales. We further used permutation test to determine the possible differences between UC patients and healthy controls in heterogeneity scaling parameters. Results showed that UC did not significantly influence gut mucosal microbiome heterogeneity at either the community or mixed-species levels. These findings demonstrated significant resilience of the human gut microbiome and confirmed a prediction of TPLE: that the inter-subject heterogeneity scaling parameter of the gut microbiome is an intrinsic property to humans, invariant with UC disease.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320948848"},"PeriodicalIF":2.6,"publicationDate":"2020-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320948848","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38526965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01eCollection Date: 2020-01-01DOI: 10.1177/1176934320954870
Jie-Mei Yu, Li-Shu Zhang, Yuan-Hui Fu, Feng-Min Ji, Han-Li Xu, Jia-Qiang Huang, Xiang-Lei Peng, Yan-Peng Zheng, Ying Zhang, Jin-Sheng He
Monitoring the mutation and evolution of the virus is important for tracing its ongoing transmission and facilitating effective vaccine development. A total of 342 complete genomic sequences of SARS-CoV-2 were analyzed in this study. Compared to the reference genome reported in December 2019, 465 mutations were found, among which, 347 occurred in only 1 sequence, while 26 occurred in more than 5 sequences. For these 26 further identified as SNPs, 14 were closely linked and were grouped into 5 profiles. Phylogenetic analysis revealed the sequences formed 2 major groups. Most of the sequences in late period (March and April) constituted the Cluster II, while the sequences before March in this study and the reported S/L and A/B/C types in previous studies were all in Cluster I. The distributions of some mutations were specific geographically or temporally, the potential effect of which on the transmission and pathogenicity of SARS-CoV-2 deserves further evaluation and monitoring. Two mutations were found in the receptor-binding domain (RBD) but outside the receptor-binding motif (RBM), indicating that mutations may only have marginal biological effects but merit further attention. The observed novel sequence divergence is of great significance to the study of the transmission, pathogenicity, and development of an effective vaccine for SARS-CoV-2.
{"title":"Analysis of Continuous Mutation and Evolution on Circulating SARS-CoV-2.","authors":"Jie-Mei Yu, Li-Shu Zhang, Yuan-Hui Fu, Feng-Min Ji, Han-Li Xu, Jia-Qiang Huang, Xiang-Lei Peng, Yan-Peng Zheng, Ying Zhang, Jin-Sheng He","doi":"10.1177/1176934320954870","DOIUrl":"10.1177/1176934320954870","url":null,"abstract":"<p><p>Monitoring the mutation and evolution of the virus is important for tracing its ongoing transmission and facilitating effective vaccine development. A total of 342 complete genomic sequences of SARS-CoV-2 were analyzed in this study. Compared to the reference genome reported in December 2019, 465 mutations were found, among which, 347 occurred in only 1 sequence, while 26 occurred in more than 5 sequences. For these 26 further identified as SNPs, 14 were closely linked and were grouped into 5 profiles. Phylogenetic analysis revealed the sequences formed 2 major groups. Most of the sequences in late period (March and April) constituted the Cluster II, while the sequences before March in this study and the reported S/L and A/B/C types in previous studies were all in Cluster I. The distributions of some mutations were specific geographically or temporally, the potential effect of which on the transmission and pathogenicity of SARS-CoV-2 deserves further evaluation and monitoring. Two mutations were found in the receptor-binding domain (RBD) but outside the receptor-binding motif (RBM), indicating that mutations may only have marginal biological effects but merit further attention. The observed novel sequence divergence is of great significance to the study of the transmission, pathogenicity, and development of an effective vaccine for SARS-CoV-2.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320954870"},"PeriodicalIF":1.7,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/eb/05/10.1177_1176934320954870.PMC8842338.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39930044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The detection of copy number variations (CNVs) on whole-exome sequencing (WES) represents a cost-effective technique for the study of genetic variants. This approach, however, has encountered an obstacle with high false-positive rates due to biases from exome sequencing capture kits and GC contents. Although plenty of CNV detection tools have been developed, they do not perform well with all types of CNVs. In addition, most tools lack features of genetic annotation, CNV visualization, and flexible installation, requiring users to put much effort into CNV interpretation. Here, we present "inCNV," a web-based application that can accept multiple CNV-tool results, then integrate and prioritize them with user-friendly interfaces. This application helps users analyze the importance of called CNVs by generating CNV annotations from Ensembl, Database of Genomic Variants (DGV), ClinVar, and Online Mendelian Inheritance in Man (OMIM). Moreover, users can select and export CNVs of interest including their flanking sequences for primer design and experimental verification. We demonstrated how inCNV could help users filter and narrow down the called CNVs to a potentially novel CNV, a common CNV within a group of samples of the same disease, or a de novo CNV of a sample within the same family. Besides, we have provided in CNV as a docker image for ease of installation (https://github.com/saowwapark/inCNV).
{"title":"inCNV: An Integrated Analysis Tool for Copy Number Variation on Whole Exome Sequencing.","authors":"Saowwapark Chanwigoon, Sakkayaphab Piwluang, Duangdao Wichadakul","doi":"10.1177/1176934320956577","DOIUrl":"10.1177/1176934320956577","url":null,"abstract":"<p><p>The detection of copy number variations (CNVs) on whole-exome sequencing (WES) represents a cost-effective technique for the study of genetic variants. This approach, however, has encountered an obstacle with high false-positive rates due to biases from exome sequencing capture kits and GC contents. Although plenty of CNV detection tools have been developed, they do not perform well with all types of CNVs. In addition, most tools lack features of genetic annotation, CNV visualization, and flexible installation, requiring users to put much effort into CNV interpretation. Here, we present \"inCNV,\" a web-based application that can accept multiple CNV-tool results, then integrate and prioritize them with user-friendly interfaces. This application helps users analyze the importance of called CNVs by generating CNV annotations from Ensembl, Database of Genomic Variants (DGV), ClinVar, and Online Mendelian Inheritance in Man (OMIM). Moreover, users can select and export CNVs of interest including their flanking sequences for primer design and experimental verification. We demonstrated how inCNV could help users filter and narrow down the called CNVs to a potentially novel CNV, a common CNV within a group of samples of the same disease, or a <i>de novo</i> CNV of a sample within the same family. Besides, we have provided in CNV as a docker image for ease of installation (https://github.com/saowwapark/inCNV).</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320956577"},"PeriodicalIF":1.7,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e5/a0/10.1177_1176934320956577.PMC7520931.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38464945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hepatocellular carcinoma (HCC) is a common malignant tumor representing more than 90% of primary liver cancer. This study aimed to identify metabolism-related biomarkers with prognostic value by developing the novel prognostic score (PS) model. Transcriptomic profiles derived from TCGA and EBIArray databases were analyzed to identify differentially expressed genes (DEGs) in HCC tumor samples compared with normal samples. The overlapped genes between DEGs and metabolism-related genes (crucial genes) were screened and functionally analyzed. A novel PS model was constructed to identify optimal signature genes. Cox regression analysis was performed to identify independent clinical factors related to prognosis. Nomogram model was constructed to estimate the predictability of clinical factors. Finally, protein expression of crucial genes was explored in different cancer tissues and cell types from the Human Protein Atlas (HPA). We screened a total of 305 overlapped genes (differentially expressed metabolism-related genes). These genes were mainly involved in "oxidation reduction," "steroid hormone biosynthesis," "fatty acid metabolic process," and "linoleic acid metabolism." Furthermore, we screened ten optimal DEGs (CYP2C9, CYP3A4, and TKT, among others) by using the PS model. Two clinical factors of pathologic stage (P < .001, HR: 1.512 [1.219-1.875]) and PS status (P <.001, HR: 2.259 [1.522-3.354]) were independent prognostic predictors by cox regression analysis. Nomogram model showed a high predicted probability of overall survival time, and the AUC value was 0.837. The expression status of 7 proteins was frequently altered in normal or differential tumor tissues, such as liver cancer and stomach cancer samples.We have identified several metabolism-related biomarkers for prognosis prediction of HCC based on the PS model. Two clinical factors were independent prognostic predictors of pathologic stage and PS status (high/low risk). The prognosis prediction model described in this study is a useful and stable method for novel biomarker identification.
{"title":"Prognostic Score-based Clinical Factors and Metabolism-related Biomarkers for Predicting the Progression of Hepatocellular Carcinoma.","authors":"Jia Yan, Ming Shu, Xiang Li, Hua Yu, Shuhuai Chen, Shujie Xie","doi":"10.1177/1176934320951571","DOIUrl":"10.1177/1176934320951571","url":null,"abstract":"<p><p>Hepatocellular carcinoma (HCC) is a common malignant tumor representing more than 90% of primary liver cancer. This study aimed to identify metabolism-related biomarkers with prognostic value by developing the novel prognostic score (PS) model. Transcriptomic profiles derived from TCGA and EBIArray databases were analyzed to identify differentially expressed genes (DEGs) in HCC tumor samples compared with normal samples. The overlapped genes between DEGs and metabolism-related genes (crucial genes) were screened and functionally analyzed. A novel PS model was constructed to identify optimal signature genes. Cox regression analysis was performed to identify independent clinical factors related to prognosis. Nomogram model was constructed to estimate the predictability of clinical factors. Finally, protein expression of crucial genes was explored in different cancer tissues and cell types from the Human Protein Atlas (HPA). We screened a total of 305 overlapped genes (differentially expressed metabolism-related genes). These genes were mainly involved in \"oxidation reduction,\" \"steroid hormone biosynthesis,\" \"fatty acid metabolic process,\" and \"linoleic acid metabolism.\" Furthermore, we screened ten optimal DEGs (CYP2C9, CYP3A4, and TKT, among others) by using the PS model. Two clinical factors of pathologic stage (P < .001, HR: 1.512 [1.219-1.875]) and PS status (P <.001, HR: 2.259 [1.522-3.354]) were independent prognostic predictors by cox regression analysis. Nomogram model showed a high predicted probability of overall survival time, and the AUC value was 0.837. The expression status of 7 proteins was frequently altered in normal or differential tumor tissues, such as liver cancer and stomach cancer samples.We have identified several metabolism-related biomarkers for prognosis prediction of HCC based on the PS model. Two clinical factors were independent prognostic predictors of pathologic stage and PS status (high/low risk). The prognosis prediction model described in this study is a useful and stable method for novel biomarker identification.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320951571"},"PeriodicalIF":2.6,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/c6/94/10.1177_1176934320951571.PMC7518001.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38452432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-11eCollection Date: 2020-01-01DOI: 10.1177/1176934320941495
Jun Yang, Peng Xu, Diqiu Yu
Rice (Oryza sativa) yield is correlated to various factors. Transcription regulators are important factors, such as the typical SHORT INTERNODES-related sequences (SRSs), which encode proteins with single zinc finger motifs. Nevertheless, knowledge regarding the evolutionary and functional characteristics of the SRS gene family members in rice is insufficient. Therefore, we performed a genome-wide screening and characterization of the OsSRS gene family in Oryza sativa japonica rice. We also examined the SRS proteins from 11 rice sub-species, consisting of 3 cultivars, 6 wild varieties, and 2 other genome types. SRS members from maize, sorghum, Brachypodium distachyon, and Arabidopsis were also investigated. All these SRS proteins exhibited species-specific characteristics, as well as monocot- and dicot-specific characteristics, as assessed by phylogenetic analysis, which was further validated by gene structure and motif analyses. Genome comparisons revealed that segmental duplications may have played significant roles in the recombination of the OsSRS gene family and their expression levels. The family was mainly subjected to purifying selective pressure. In addition, the expression data demonstrated the distinct responses of OsSRS genes to various abiotic stresses and hormonal treatments, indicating their functional divergence. Our study provides a good reference for elucidating the functions of SRS genes in rice.
{"title":"Genome-Wide Identification and Characterization of the SHI-Related Sequence Gene Family in Rice.","authors":"Jun Yang, Peng Xu, Diqiu Yu","doi":"10.1177/1176934320941495","DOIUrl":"https://doi.org/10.1177/1176934320941495","url":null,"abstract":"<p><p>Rice (<i>Oryza sativa</i>) yield is correlated to various factors. Transcription regulators are important factors, such as the typical SHORT INTERNODES-related sequences (SRSs), which encode proteins with single zinc finger motifs. Nevertheless, knowledge regarding the evolutionary and functional characteristics of the <i>SRS</i> gene family members in rice is insufficient. Therefore, we performed a genome-wide screening and characterization of the <i>OsSRS</i> gene family in <i>Oryza sativa</i> japonica rice. We also examined the SRS proteins from 11 rice sub-species, consisting of 3 cultivars, 6 wild varieties, and 2 other genome types. SRS members from maize, sorghum, <i>Brachypodium distachyon</i>, and <i>Arabidopsis</i> were also investigated. All these SRS proteins exhibited species-specific characteristics, as well as monocot- and dicot-specific characteristics, as assessed by phylogenetic analysis, which was further validated by gene structure and motif analyses. Genome comparisons revealed that segmental duplications may have played significant roles in the recombination of the <i>OsSRS</i> gene family and their expression levels. The family was mainly subjected to purifying selective pressure. In addition, the expression data demonstrated the distinct responses of <i>OsSRS</i> genes to various abiotic stresses and hormonal treatments, indicating their functional divergence. Our study provides a good reference for elucidating the functions of <i>SRS</i> genes in rice.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320941495"},"PeriodicalIF":2.6,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320941495","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38408336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-02eCollection Date: 2020-01-01DOI: 10.1177/1176934320941500
Kabita Baral, Peter Rotwein
Recent advances in genetics present unique opportunities for enhancing our understanding of human physiology and disease predisposition through detailed analysis of gene structure, expression, and population variation via examination of data in publicly accessible genome and gene expression repositories. Yet, the vast majority of human genes remain understudied. Here, we show the scope of these genomic and genetic resources by evaluating ZMAT2, a member of a 5-gene family that through May 2020 had been the focus of only 4 peer-reviewed scientific publications. Using analysis of information extracted from public databases, we show that human ZMAT2 is a 6-exon gene and find that it exhibits minimal genetic variation in human populations and in disease states, including cancer. We further demonstrate that the gene and its encoded protein are highly conserved among nonhuman primates and define a cohort of ZMAT2 pseudogenes in the marmoset genome. Collectively, our investigations illustrate how complementary use of genomic, gene expression, and population genetic resources can lead to new insights about human and mammalian biology and evolution, and when coupled with data supporting key roles for ZMAT2 in keratinocyte differentiation and pre-RNA splicing argue that this gene is worthy of further study.
{"title":"<i>ZMAT2</i> in Humans and Other Primates: A Highly Conserved and Understudied Gene.","authors":"Kabita Baral, Peter Rotwein","doi":"10.1177/1176934320941500","DOIUrl":"https://doi.org/10.1177/1176934320941500","url":null,"abstract":"<p><p>Recent advances in genetics present unique opportunities for enhancing our understanding of human physiology and disease predisposition through detailed analysis of gene structure, expression, and population variation via examination of data in publicly accessible genome and gene expression repositories. Yet, the vast majority of human genes remain understudied. Here, we show the scope of these genomic and genetic resources by evaluating <i>ZMAT2</i>, a member of a 5-gene family that through May 2020 had been the focus of only 4 peer-reviewed scientific publications. Using analysis of information extracted from public databases, we show that human <i>ZMAT2</i> is a 6-exon gene and find that it exhibits minimal genetic variation in human populations and in disease states, including cancer. We further demonstrate that the gene and its encoded protein are highly conserved among nonhuman primates and define a cohort of <i>ZMAT2</i> pseudogenes in the marmoset genome. Collectively, our investigations illustrate how complementary use of genomic, gene expression, and population genetic resources can lead to new insights about human and mammalian biology and evolution, and when coupled with data supporting key roles for ZMAT2 in keratinocyte differentiation and pre-RNA splicing argue that this gene is worthy of further study.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320941500"},"PeriodicalIF":2.6,"publicationDate":"2020-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320941500","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38496158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Triple-negative breast cancer (TNBC) is the most aggressive and fatal sub-type of breast cancer. This study aimed to identify metastasis-associated genes that could serve as biomarkers for TNBC diagnosis and prognosis. RNA-seq data and clinical information on TNBC from the Cancer Genome Atlas were used to conduct analyses. Expression data were used to establish co-expression modules using average linkage hierarchical clustering. We used weighted gene co-expression network analysis to explore the associations between gene sets and clinical features and to identify metastasis-associated candidate biomarkers. The K-M plotter website was used to explore the association between the expression of candidate biomarkers and patient survival. In addition, receiver operating characteristic curve analysis was used to illustrate the diagnostic performance of candidate genes. The pale turquoise module was significantly associated with the occurrence of metastasis. In this module, 64 genes were identified, and its functional enrichment analysis revealed that they were mainly associated with transcriptional misregulation in cancer, microRNAs in cancer, and negative regulation of angiogenesis. Further, 4 genes, IGSF10, RUNX1T1, XIST, and TSHZ2, which were negatively associated with relapse-free survival and have seldom been reported before in TNBC, were selected. In addition, the mRNA expression levels of the 4 candidate genes were significantly lower in TNBC tumor tissues compared with healthy tissues. Based on the K-M plotter, these 4 genes were correlated with poor prognosis of TNBC. The area under the curve of IGSF10, RUNX1T1, TSHZ2, and XIST was 0.918, 0.957, 0.977, and 0.749. These findings provide new insight into TNBC metastasis. IGSF10, RUNX1T1, TSHZ2, and XIST could be used as candidate biomarkers for the diagnosis and prognosis of TNBC metastasis.
{"title":"Identification of Metastasis-Associated Genes in Triple-Negative Breast Cancer Using Weighted Gene Co-expression Network Analysis.","authors":"Wenting Xie, Zhongshi Du, Yijie Chen, Naxiang Liu, Zhaoming Zhong, Youhong Shen, Lina Tang","doi":"10.1177/1176934320954868","DOIUrl":"https://doi.org/10.1177/1176934320954868","url":null,"abstract":"<p><p>Triple-negative breast cancer (TNBC) is the most aggressive and fatal sub-type of breast cancer. This study aimed to identify metastasis-associated genes that could serve as biomarkers for TNBC diagnosis and prognosis. RNA-seq data and clinical information on TNBC from the Cancer Genome Atlas were used to conduct analyses. Expression data were used to establish co-expression modules using average linkage hierarchical clustering. We used weighted gene co-expression network analysis to explore the associations between gene sets and clinical features and to identify metastasis-associated candidate biomarkers. The K-M plotter website was used to explore the association between the expression of candidate biomarkers and patient survival. In addition, receiver operating characteristic curve analysis was used to illustrate the diagnostic performance of candidate genes. The pale turquoise module was significantly associated with the occurrence of metastasis. In this module, 64 genes were identified, and its functional enrichment analysis revealed that they were mainly associated with transcriptional misregulation in cancer, microRNAs in cancer, and negative regulation of angiogenesis. Further, 4 genes, <i>IGSF10, RUNX1T1, XIST</i>, and <i>TSHZ2</i>, which were negatively associated with relapse-free survival and have seldom been reported before in TNBC, were selected. In addition, the mRNA expression levels of the 4 candidate genes were significantly lower in TNBC tumor tissues compared with healthy tissues. Based on the K-M plotter, these 4 genes were correlated with poor prognosis of TNBC. The area under the curve of <i>IGSF10, RUNX1T1, TSHZ2</i>, and <i>XIST</i> was 0.918, 0.957, 0.977, and 0.749. These findings provide new insight into TNBC metastasis. <i>IGSF10, RUNX1T1, TSHZ2</i>, and <i>XIST</i> could be used as candidate biomarkers for the diagnosis and prognosis of TNBC metastasis.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320954868"},"PeriodicalIF":2.6,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320954868","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38496159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-27eCollection Date: 2020-01-01DOI: 10.1177/1176934320942192
Su Xu, Jianjun Cheng, Xiangchen Meng, Yan Xu, Ying Mu
Lactobacillus reuteri YSJL-12 was isolated from healthy sow fresh feces and used as probiotics additives previously. To investigate the genetic basis on probiotic potential and identify the genes in the strain, the complete genome of YSJL-12 was sequenced. Then comparative genome analysis on 9 strains of Lactobacillus reuteri was performed. The genome of YSJL-12 consisted of a circular 2,084,748 bp chromosome and 2 circular plasmids (51,906 and 15,134 bp). From among the 2065 protein-coding sequences (CDSs), the genes resistant to the environmental stress were identified. The function of COG (Clusters of Orthologous Group) protein genes was predicted, and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were analyzed. The comparative genome analysis indicated that the pan-genome contained a core genome of 1257 orthologous gene clusters, an accessory genome of 1064 orthologous gene clusters, and 1148 strain-specific genes, and the antibacterial mechanism among Lactobacillus reuteri strains might be different. The phylogenetic analysis and genomic collinearity revealed that the phylogenetic relationship among 9 strains of Lactobacillus reuteri was connected with host species and showed host specificity. The research could help us to better predict genes function and understand genetic basis on adapting to host gut in Lactobacillus reuteri YSJL-12.
罗伊氏乳杆菌YSJL-12是从健康母猪新鲜粪便中分离得到的,曾作为益生菌添加剂使用。为了研究该菌株益生菌潜力的遗传基础和鉴定菌株的基因,对YSJL-12进行了全基因组测序。对9株罗伊氏乳杆菌进行比较基因组分析。YSJL-12基因组由一条环状2084748 bp的染色体和两个环状质粒(51906 bp和15134 bp)组成。从2065个蛋白质编码序列(CDSs)中鉴定出抗环境胁迫的基因。预测了COG (Clusters of Orthologous Group)蛋白基因的功能,并分析了KEGG (Kyoto Encyclopedia of genes and Genomes)通路。比较基因组分析表明,该泛基因组包含1257个同源基因簇的核心基因组,1064个同源基因簇的辅助基因组,以及1148个菌株特异性基因,菌株间的抑菌机制可能存在差异。系统发育分析和基因组共线性分析表明,9株罗伊氏乳杆菌的系统发育关系与宿主种类有关,具有宿主特异性。本研究有助于更好地预测罗伊氏乳杆菌YSJL-12的基因功能,了解其适应宿主肠道的遗传基础。
{"title":"Complete Genome and Comparative Genome Analysis of <i>Lactobacillus reuteri</i> YSJL-12, a Potential Probiotics Strain Isolated From Healthy Sow Fresh Feces.","authors":"Su Xu, Jianjun Cheng, Xiangchen Meng, Yan Xu, Ying Mu","doi":"10.1177/1176934320942192","DOIUrl":"https://doi.org/10.1177/1176934320942192","url":null,"abstract":"<p><p><i>Lactobacillus reuteri</i> YSJL-12 was isolated from healthy sow fresh feces and used as probiotics additives previously. To investigate the genetic basis on probiotic potential and identify the genes in the strain, the complete genome of YSJL-12 was sequenced. Then comparative genome analysis on 9 strains of <i>Lactobacillus reuteri</i> was performed. The genome of YSJL-12 consisted of a circular 2,084,748 bp chromosome and 2 circular plasmids (51,906 and 15,134 bp). From among the 2065 protein-coding sequences (CDSs), the genes resistant to the environmental stress were identified. The function of COG (Clusters of Orthologous Group) protein genes was predicted, and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were analyzed. The comparative genome analysis indicated that the pan-genome contained a core genome of 1257 orthologous gene clusters, an accessory genome of 1064 orthologous gene clusters, and 1148 strain-specific genes, and the antibacterial mechanism among <i>Lactobacillus reuteri</i> strains might be different. The phylogenetic analysis and genomic collinearity revealed that the phylogenetic relationship among 9 strains of <i>Lactobacillus reuteri</i> was connected with host species and showed host specificity. The research could help us to better predict genes function and understand genetic basis on adapting to host gut in <i>Lactobacillus reuteri</i> YSJL-12.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320942192"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320942192","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-27eCollection Date: 2020-01-01DOI: 10.1177/1176934320924681
Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal
Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.
{"title":"Generation of Cry11 Variants of <i>Bacillus thuringiensis</i> by Heuristic Computational Modeling.","authors":"Efraín Hernando Pinzón-Reyes, Daniel Alfonso Sierra-Bueno, Miguel Orlando Suarez-Barrera, Nohora Juliana Rueda-Forero, Sebastián Abaunza-Villamizar, Paola Rondón-Villareal","doi":"10.1177/1176934320924681","DOIUrl":"https://doi.org/10.1177/1176934320924681","url":null,"abstract":"<p><p>Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, <i>cry11Aa</i> and <i>cry11Ba</i>, of <i>Bacillus thuringiensis</i>. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. <i>Cry11</i> toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of <i>Cry11</i> chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with <i>Cry</i> toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of <i>Cry11</i> variants, in which structural characteristics of wild <i>Cry</i> families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320924681"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320924681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38262585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-27eCollection Date: 2020-01-01DOI: 10.1177/1176934320944932
Yi-Pin Lai, Thomas R Ioerger
Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.
{"title":"Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes.","authors":"Yi-Pin Lai, Thomas R Ioerger","doi":"10.1177/1176934320944932","DOIUrl":"https://doi.org/10.1177/1176934320944932","url":null,"abstract":"<p><p>Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) <i>Mycobacterium tuberculosis</i> from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"16 ","pages":"1176934320944932"},"PeriodicalIF":2.6,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1176934320944932","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38255196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}