Tao Yu, Yingfeng Luo, Xinyu Tan, Dahe Zhao, Xiaochun Bi, Chenji Li, Yanning Zheng, Hua Xiang, Songnian Hu
Abstract Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change. The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life, showcasing an unparalleled reservoir of microbial genetic diversity. By analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types, we present a comprehensive Cold Seep Microbiomic Database (CSMD) to archive the genomic and functional diversity of cold seep microbiome. The CSMD included over 49 million non-redundant genes and 3175 metagenome-assembled genomes (MAGs), which represented 1895 species spanning 105 phyla. In addition, beta diversity analysis indicated that both the sampling site and cold seep type had a substantial impact on the prokaryotic microbiome community composition. Heterotrophic and anaerobic metabolisms were prevalent in microbial communities, accompanied by considerable mixotrophs and facultative anaerobes, highlighting the versatile metabolic potential in cold seeps. Furthermore, secondary metabolic gene cluster analysis indicated that at least 98.81% of the sequences potentially encoded novel natural products, with ribosomal processing peptides being the predominant type widely distributed in archaea and bacteria. Overall, the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.
{"title":"Global Marine Cold Seep Metagenomes Reveal Diversity of Taxonomy, Metabolic Function, and Natural Products","authors":"Tao Yu, Yingfeng Luo, Xinyu Tan, Dahe Zhao, Xiaochun Bi, Chenji Li, Yanning Zheng, Hua Xiang, Songnian Hu","doi":"10.1093/gpbjnl/qzad006","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad006","url":null,"abstract":"<jats:title>Abstract</jats:title> Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change. The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life, showcasing an unparalleled reservoir of microbial genetic diversity. By analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types, we present a comprehensive Cold Seep Microbiomic Database (CSMD) to archive the genomic and functional diversity of cold seep microbiome. The CSMD included over 49 million non-redundant genes and 3175 metagenome-assembled genomes (MAGs), which represented 1895 species spanning 105 phyla. In addition, beta diversity analysis indicated that both the sampling site and cold seep type had a substantial impact on the prokaryotic microbiome community composition. Heterotrophic and anaerobic metabolisms were prevalent in microbial communities, accompanied by considerable mixotrophs and facultative anaerobes, highlighting the versatile metabolic potential in cold seeps. Furthermore, secondary metabolic gene cluster analysis indicated that at least 98.81% of the sequences potentially encoded novel natural products, with ribosomal processing peptides being the predominant type widely distributed in archaea and bacteria. Overall, the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"414 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The monkeypox virus (mpox virus, MPXV) epidemic in 2022 has posed a significant public health risk. Yet, the evolutionary principles of MPXV remain largely unknown. Here, we examined the evolutionary patterns of protein sequences and codon usage in MPXV. We first demonstrated the signal of positive selection in OPG027, specifically in the Clade I lineage of MPXV. Subsequently, we discovered accelerated protein sequence evolution over time in the variants responsible for the 2022 outbreak. Furthermore, we showed strong epistasis between amino acid substitutions located in different genes. The codon adaptation index (CAI) analysis revealed that MPXV genes tended to use more non-preferred codons compared to human genes, and the CAI decreased over time and diverged between clades, with Clade I > IIa and IIb-A > IIb-B. While the decrease in fatality rate among the three groups aligned with the CAI pattern, it remains unclear whether this correlation was coincidental or if the deoptimization of codon usage in MPXV led to a reduction in fatality rates. This study sheds new light on the mechanisms that govern the evolution of MPXV in human populations.
摘要 2022 年流行的猴痘病毒(monkeypox virus,MPXV)对公众健康构成了重大威胁。然而,MPXV的进化原理在很大程度上仍然未知。在此,我们研究了MPXV中蛋白质序列和密码子使用的进化模式。我们首先证明了 OPG027 中的正选择信号,特别是在 MPXV 的支系 I 中。随后,我们发现在造成 2022 年疫情爆发的变体中,蛋白质序列随时间加速进化。此外,我们还发现位于不同基因中的氨基酸替代之间存在很强的外显性。密码子适应指数(CAI)分析表明,与人类基因相比,MPXV基因倾向于使用更多的非首选密码子,CAI随时间推移而降低,并在支系之间出现分化,支系I>IIa和支系IIb-A>IIb-B。虽然这三个类群的死亡率下降与 CAI 模式一致,但目前还不清楚这种相关性是巧合还是 MPXV 中密码子使用的非优化导致了死亡率的下降。这项研究为人类中 MPXV 的进化机制提供了新的线索。
{"title":"Molecular Evolution of Protein Sequences and Codon Usage in Monkeypox Viruses","authors":"Ke-jia Shan, Changcheng Wu, Xiaolu Tang, Roujian Lu, Yaling Hu, Wenjie Tan, Jian Lu","doi":"10.1093/gpbjnl/qzad003","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad003","url":null,"abstract":"<jats:title>Abstract</jats:title> The monkeypox virus (mpox virus, MPXV) epidemic in 2022 has posed a significant public health risk. Yet, the evolutionary principles of MPXV remain largely unknown. Here, we examined the evolutionary patterns of protein sequences and codon usage in MPXV. We first demonstrated the signal of positive selection in OPG027, specifically in the Clade I lineage of MPXV. Subsequently, we discovered accelerated protein sequence evolution over time in the variants responsible for the 2022 outbreak. Furthermore, we showed strong epistasis between amino acid substitutions located in different genes. The codon adaptation index (CAI) analysis revealed that MPXV genes tended to use more non-preferred codons compared to human genes, and the CAI decreased over time and diverged between clades, with Clade I &gt; IIa and IIb-A &gt; IIb-B. While the decrease in fatality rate among the three groups aligned with the CAI pattern, it remains unclear whether this correlation was coincidental or if the deoptimization of codon usage in MPXV led to a reduction in fatality rates. This study sheds new light on the mechanisms that govern the evolution of MPXV in human populations.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binzhong Wang, Bin Wu, Xueqing Liu, Yacheng Hu, Yao Ming, Mingzhou Bai, Juanjuan Liu, Kan Xiao, Qingkai Zeng, Jing Yang, Hongqi Wang, Baifu Guo, Chun Tan, Zixuan Hu, Xun Zhao, Yanhong Li, Zhen Yue, Junpu Mei, Wei Jiang, Yuanjin Yang, Zhiyuan Li, Yong Gao, Lei Chen, Jianbo Jian, Hejun Du
Abstract The order Acipenseriformes, which includes sturgeons and paddlefishes, represents “living fossils” with complex genomes that are good models for understanding whole-genome duplication (WGD) and ploidy evolution in fishes. Here, we sequenced and assembled the first high-quality chromosome-level genome for the complex octoploid Acipenser sinensis (Chinese sturgeon), a critically endangered species that also represents a poorly understood ploidy group in Acipenseriformes. Our results show that A. sinensis is a complex autooctoploid species containing four kinds of octovalents (8n), a hexavalent (6n), two tetravalents (4n), and a divalent (2n). An analysis taking into account delayed rediploidization reveals that the octoploid genome composition of Chinese sturgeon results from two rounds of homologous WGDs, and further provides insights into the timing of its ploidy evolution. This study provides the first octoploid genome resource of Acipenseriformes for understanding ploidy compositions and evolutionary trajectories of polyploidy fishes.
{"title":"Whole-genome Sequencing Reveals Autooctoploidy in Chinese Sturgeon and Its Evolutionary Trajectories","authors":"Binzhong Wang, Bin Wu, Xueqing Liu, Yacheng Hu, Yao Ming, Mingzhou Bai, Juanjuan Liu, Kan Xiao, Qingkai Zeng, Jing Yang, Hongqi Wang, Baifu Guo, Chun Tan, Zixuan Hu, Xun Zhao, Yanhong Li, Zhen Yue, Junpu Mei, Wei Jiang, Yuanjin Yang, Zhiyuan Li, Yong Gao, Lei Chen, Jianbo Jian, Hejun Du","doi":"10.1093/gpbjnl/qzad002","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad002","url":null,"abstract":"<jats:title>Abstract</jats:title> The order Acipenseriformes, which includes sturgeons and paddlefishes, represents “living fossils” with complex genomes that are good models for understanding whole-genome duplication (WGD) and ploidy evolution in fishes. Here, we sequenced and assembled the first high-quality chromosome-level genome for the complex octoploid Acipenser sinensis (Chinese sturgeon), a critically endangered species that also represents a poorly understood ploidy group in Acipenseriformes. Our results show that A. sinensis is a complex autooctoploid species containing four kinds of octovalents (8n), a hexavalent (6n), two tetravalents (4n), and a divalent (2n). An analysis taking into account delayed rediploidization reveals that the octoploid genome composition of Chinese sturgeon results from two rounds of homologous WGDs, and further provides insights into the timing of its ploidy evolution. This study provides the first octoploid genome resource of Acipenseriformes for understanding ploidy compositions and evolutionary trajectories of polyploidy fishes.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"153 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shan-Ju Yeh, Shreya Paithankar, Ruoqiao Chen, Jing Xing, Mengying Sun, Ke Liu, Jiayu Zhou, Bin Chen
Abstract Gene expression profiling of new or modified cell lines becomes routine today; however, obtaining comprehensive molecular characterization and cellular responses for a variety of cell lines, including those derived from underrepresented groups, is not trivial when resources are minimal. Using gene expression to predict other measurements has been actively explored; however, systematic investigation of its predictive power in various measurements has not been well studied. We evaluated commonly used machine learning methods and presented TransCell, a two-step deep transfer learning framework that utilized the knowledge derived from pan-cancer tumor samples to predict molecular features and responses. Among these models, TransCell has the best performance in predicting metabolite, gene effect score (or genetic dependency), and drug sensitivity, and has comparable performance in predicting mutation, copy number variation, and protein expression. Notably, TransCell improved the performance by over 50% in drug sensitivity prediction and achieved a correlation of 0.7 in gene effect score prediction. Furthermore, predicted drug sensitivities revealed potential repurposing candidates for new 100 pediatric cancer cell lines, and predicted gene effect scores reflected BRAF resistance in melanoma cell lines. Together, we investigated the predictive power of gene expression in six molecular measurement types and developed a web portal (http://apps.octad.org/transcell/) that enables the prediction of 352,000 genomic and cellular response features solely from gene expression profiles.
{"title":"TransCell: In silico Characterization of Genomic Landscape and Cellular Responses by Deep Transfer Learning","authors":"Shan-Ju Yeh, Shreya Paithankar, Ruoqiao Chen, Jing Xing, Mengying Sun, Ke Liu, Jiayu Zhou, Bin Chen","doi":"10.1093/gpbjnl/qzad008","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad008","url":null,"abstract":"<jats:title>Abstract</jats:title> Gene expression profiling of new or modified cell lines becomes routine today; however, obtaining comprehensive molecular characterization and cellular responses for a variety of cell lines, including those derived from underrepresented groups, is not trivial when resources are minimal. Using gene expression to predict other measurements has been actively explored; however, systematic investigation of its predictive power in various measurements has not been well studied. We evaluated commonly used machine learning methods and presented TransCell, a two-step deep transfer learning framework that utilized the knowledge derived from pan-cancer tumor samples to predict molecular features and responses. Among these models, TransCell has the best performance in predicting metabolite, gene effect score (or genetic dependency), and drug sensitivity, and has comparable performance in predicting mutation, copy number variation, and protein expression. Notably, TransCell improved the performance by over 50% in drug sensitivity prediction and achieved a correlation of 0.7 in gene effect score prediction. Furthermore, predicted drug sensitivities revealed potential repurposing candidates for new 100 pediatric cancer cell lines, and predicted gene effect scores reflected BRAF resistance in melanoma cell lines. Together, we investigated the predictive power of gene expression in six molecular measurement types and developed a web portal (http://apps.octad.org/transcell/) that enables the prediction of 352,000 genomic and cellular response features solely from gene expression profiles.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The microbiome plays a critical role in the process of conception and the outcomes of pregnancy. Disruptions in microbiome homeostasis in women of reproductive age can lead to various pregnancy complications, which significantly impact maternal and fetal health. Recent studies have associated the microbiome in the female reproductive tract (FRT) with assisted reproductive technology (ART) outcomes, and restoring microbiome balance has been shown to improve fertility in infertile couples. This review provides an overview of the role of the microbiome in female reproductive health, including its implications for pregnancy outcomes and ARTs. Additionally, recent advances in the use of microbial biomarkers as indicators of pregnancy disorders are summarized. A comprehensive understanding of the characteristics of the microbiome before and during pregnancy and its impact on reproductive health will greatly promote maternal and fetal health. Such knowledge can also contribute to the development of ARTs and microbiome-based interventions.
{"title":"Microbiome in Female Reproductive Health: Implications for Fertility and Assisted Reproductive Technologies","authors":"Liwen Xiao, Zhenqiang Zuo, Fangqing Zhao","doi":"10.1093/gpbjnl/qzad005","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad005","url":null,"abstract":"<jats:title>Abstract</jats:title> The microbiome plays a critical role in the process of conception and the outcomes of pregnancy. Disruptions in microbiome homeostasis in women of reproductive age can lead to various pregnancy complications, which significantly impact maternal and fetal health. Recent studies have associated the microbiome in the female reproductive tract (FRT) with assisted reproductive technology (ART) outcomes, and restoring microbiome balance has been shown to improve fertility in infertile couples. This review provides an overview of the role of the microbiome in female reproductive health, including its implications for pregnancy outcomes and ARTs. Additionally, recent advances in the use of microbial biomarkers as indicators of pregnancy disorders are summarized. A comprehensive understanding of the characteristics of the microbiome before and during pregnancy and its impact on reproductive health will greatly promote maternal and fetal health. Such knowledge can also contribute to the development of ARTs and microbiome-based interventions.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"7 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Since its establishment in 2013, BioLiP has become one of the widely used resources for protein–ligand interactions. Nevertheless, several known issues occurred with it over the past decade. For example, the protein–ligand interactions are represented in the form of single-chain-based tertiary structures, which may be inappropriate as many interactions involve multiple protein chains (known as quaternary structures). We sought to address these issues, resulting in Q-BioLiP, a comprehensive resource for quaternary structure-based protein–ligand interactions. The major features of Q-BioLiP include: (1) representing protein structures in the form of quaternary structures rather than single-chain-based tertiary structures; (2) pairing DNA/RNA chains properly rather than separation; (3) providing both experimental and predicted binding affinities; (4) retaining both biologically relevant and irrelevant interactions to alleviate the problem of the wrong justification of ligands’ biological relevance; and (5) developing a new quaternary structure-based algorithm for the modelling of protein–ligand complex structure. With these new features, Q-BioLiP is expected to be a valuable resource for studying biomolecule interactions, including protein–small molecule interaction, protein–metal ion interaction, protein–peptide interaction, protein–protein interaction, protein–DNA/RNA interaction, and RNA–small molecule interaction. Q-BioLiP is freely available at https://yanglab.qd.sdu.edu.cn/Q-BioLiP/.
{"title":"Q-BioLiP: A Comprehensive Resource for Quaternary Structure-based Protein–ligand Interactions","authors":"Hong Wei, Wenkai Wang, Zhenling Peng, Jianyi Yang","doi":"10.1093/gpbjnl/qzae001","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae001","url":null,"abstract":"<jats:title>Abstract</jats:title> Since its establishment in 2013, BioLiP has become one of the widely used resources for protein–ligand interactions. Nevertheless, several known issues occurred with it over the past decade. For example, the protein–ligand interactions are represented in the form of single-chain-based tertiary structures, which may be inappropriate as many interactions involve multiple protein chains (known as quaternary structures). We sought to address these issues, resulting in Q-BioLiP, a comprehensive resource for quaternary structure-based protein–ligand interactions. The major features of Q-BioLiP include: (1) representing protein structures in the form of quaternary structures rather than single-chain-based tertiary structures; (2) pairing DNA/RNA chains properly rather than separation; (3) providing both experimental and predicted binding affinities; (4) retaining both biologically relevant and irrelevant interactions to alleviate the problem of the wrong justification of ligands’ biological relevance; and (5) developing a new quaternary structure-based algorithm for the modelling of protein–ligand complex structure. With these new features, Q-BioLiP is expected to be a valuable resource for studying biomolecule interactions, including protein–small molecule interaction, protein–metal ion interaction, protein–peptide interaction, protein–protein interaction, protein–DNA/RNA interaction, and RNA–small molecule interaction. Q-BioLiP is freely available at https://yanglab.qd.sdu.edu.cn/Q-BioLiP/.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"118 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiang Hu, Zhuo Wang, Fan Liang, Shan-Lin Liu, Kai Ye, De-Peng Wang
Abstract The high-fidelity (HiFi) long-read sequencing technology developed by PacBio has greatly improved the base-level accuracy of genome assemblies. However, these assemblies still contain base-level errors, particularly within the error-prone regions of HiFi long reads. Existing genome polishing tools usually introduce overcorrections and haplotype switch errors when correcting errors in genomes assembled from HiFi long reads. Here we describe an upgraded genome polishing tool–NextPolish2, which can fix base errors remaining in those “highly accurate” genomes assembled from HiFi long reads without introducing excessive overcorrections and haplotype switch errors. We believe that NextPolish2 has a great significance to further improve the accuracy of telomere-to-telomere (T2T) genomes. NextPolish2 is freely available at https://github.com/Nextomics/NextPolish2.
{"title":"NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads","authors":"Jiang Hu, Zhuo Wang, Fan Liang, Shan-Lin Liu, Kai Ye, De-Peng Wang","doi":"10.1093/gpbjnl/qzad009","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad009","url":null,"abstract":"<jats:title>Abstract</jats:title> The high-fidelity (HiFi) long-read sequencing technology developed by PacBio has greatly improved the base-level accuracy of genome assemblies. However, these assemblies still contain base-level errors, particularly within the error-prone regions of HiFi long reads. Existing genome polishing tools usually introduce overcorrections and haplotype switch errors when correcting errors in genomes assembled from HiFi long reads. Here we describe an upgraded genome polishing tool–NextPolish2, which can fix base errors remaining in those “highly accurate” genomes assembled from HiFi long reads without introducing excessive overcorrections and haplotype switch errors. We believe that NextPolish2 has a great significance to further improve the accuracy of telomere-to-telomere (T2T) genomes. NextPolish2 is freely available at https://github.com/Nextomics/NextPolish2.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":9.5,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effects of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic single nucleotide polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 131 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing gRNA design and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.
{"title":"Systematic Exploration of Optimized Base Editing gRNA Design and Pleiotropic Effects with BExplorer","authors":"Gongchen Zhang , Chenyu Zhu , Xiaohan Chen , Jifang Yan, Dongyu Xue, Zixuan Wei, Guohui Chuai, Qi Liu","doi":"10.1016/j.gpb.2022.06.005","DOIUrl":"10.1016/j.gpb.2022.06.005","url":null,"abstract":"<div><div><strong>Base editing</strong> technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient <em>in silico</em> design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors <em>in silico</em>. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effects of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic single nucleotide polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 131 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing <strong>gRNA design</strong> and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.</div></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 6","pages":"Pages 1237-1245"},"PeriodicalIF":11.5,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40475056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.gpb.2022.08.006
Wenjun Li , Likun Wang , Xiaofang Li , Xin Zheng , Michael F. Cohen , Yong-Xin Liu
Exploring the natural diversity of functional genes/proteins from environmental DNA in high throughput remains challenging. In this study, we developed a sequence-based functional metagenomics procedure for mining the diversity of copper (Cu) resistance gene copA in global microbiomes, by combining the metagenomic assembly technology, local BLAST, evolutionary trace analysis (ETA), chemical synthesis, and conventional functional genomics. In total, 87 metagenomes were collected from a public database and subjected to copA detection, resulting in 93,899 hits. Manual curation of 1214 hits of high confidence led to the retrieval of 517 unique CopA candidates, which were further subjected to ETA. Eventually, 175 novel copA sequences of high quality were discovered. Phylogenetic analysis showed that almost all these putative CopA proteins were distantly related to known CopA proteins, with 55 sequences from totally unknown species. Ten novel and three known copA genes were chemically synthesized for further functional genomic tests using the Cu-sensitive Escherichia coli (ΔcopA). The growth test and Cu uptake determination showed that five novel clones had positive effects on host Cu resistance and uptake. One recombinant harboring copA-like 15 (copAL15) successfully restored Cu resistance of the host with a substantially enhanced Cu uptake. Two novel copA genes were fused with the gfp gene and expressed in E. coli for microscopic observation. Imaging results showed that they were successfully expressed and their proteins were localized to the membrane. The results here greatly expand the diversity of known CopA proteins, and the sequence-based procedure developed overcomes biases in length, screening methods, and abundance of conventional functional metagenomics.
从环境 DNA 中高通量探索功能基因/蛋白质的自然多样性仍然是一项挑战。在这项研究中,我们开发了一种基于序列的功能元基因组学程序,通过结合元基因组组装技术、局部 BLAST、进化痕量分析(ETA)、化学合成和传统功能基因组学,挖掘全球微生物组中铜(Cu)抗性基因 copA 的多样性。研究人员从公共数据库中收集了 87 个元基因组,并对其进行了 copA 检测,结果发现了 93,899 条命中信息。在对 1214 个高置信度的点击进行人工整理后,检索到 517 个独特的 CopA 候选者,并对其进行了进一步的 ETA 分析。最终,发现了 175 个高质量的新型 CopA 序列。系统进化分析表明,几乎所有这些假定的 CopA 蛋白都与已知的 CopA 蛋白关系密切,其中 55 个序列来自完全未知的物种。我们用化学方法合成了 10 个新的和 3 个已知的 CopA 基因,并利用对铜敏感的大肠杆菌(ΔcopA)进行了进一步的功能基因组测试。生长试验和铜吸收测定结果表明,五个新克隆对宿主的铜抗性和铜吸收有积极影响。其中一个携带 copA-like 15(copAL15)的重组体成功地恢复了宿主对铜的抗性,并大大提高了铜的吸收率。两个新型 copA 基因与 gfp 基因融合,并在大肠杆菌中表达,以进行显微观察。成像结果表明,这两个基因成功表达,其蛋白定位于膜。这些结果大大扩展了已知 CopA 蛋白的多样性,而且所开发的基于序列的程序克服了传统功能元组学在长度、筛选方法和丰度方面的偏差。
{"title":"Sequence-based Functional Metagenomics Reveals Novel Natural Diversity of Functional CopA in Environmental Microbiomes","authors":"Wenjun Li , Likun Wang , Xiaofang Li , Xin Zheng , Michael F. Cohen , Yong-Xin Liu","doi":"10.1016/j.gpb.2022.08.006","DOIUrl":"10.1016/j.gpb.2022.08.006","url":null,"abstract":"<div><div>Exploring the <strong>natural diversity</strong> of functional genes/proteins from environmental DNA in high throughput remains challenging. In this study, we developed a sequence-based <strong>functional metagenomics</strong> procedure for mining the diversity of copper (Cu) resistance gene <strong><em>copA</em></strong> in global microbiomes, by combining the metagenomic assembly technology, local BLAST, <strong>evolutionary trace analysis</strong> (ETA), chemical synthesis, and conventional functional genomics. In total, 87 metagenomes were collected from a public database and subjected to <em>copA</em> detection, resulting in 93,899 hits. Manual curation of 1214 hits of high confidence led to the retrieval of 517 unique CopA candidates, which were further subjected to ETA. Eventually, 175 novel <em>copA</em> sequences of high quality were discovered. Phylogenetic analysis showed that almost all these putative CopA proteins were distantly related to known CopA proteins, with 55 sequences from totally unknown species. Ten novel and three known <em>copA</em> genes were chemically synthesized for further functional genomic tests using the Cu-sensitive <em>Escherichia coli</em> (Δ<em>copA</em>). The growth test and Cu uptake determination showed that five novel clones had positive effects on host <strong>Cu resistance</strong> and uptake. One recombinant harboring <em>copA</em>-like 15 (<em>copAL15</em>) successfully restored Cu resistance of the host with a substantially enhanced Cu uptake. Two novel <em>copA</em> genes were fused with the <em>gfp</em> gene and expressed in <em>E. coli</em> for microscopic observation. Imaging results showed that they were successfully expressed and their proteins were localized to the membrane. The results here greatly expand the diversity of known CopA proteins, and the sequence-based procedure developed overcomes biases in length, screening methods, and abundance of conventional functional metagenomics.</div></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 6","pages":"Pages 1182-1194"},"PeriodicalIF":11.5,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33458568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}