Pub Date : 2023-12-01Epub Date: 2022-12-20DOI: 10.1016/j.gpb.2022.12.003
Zhi-Xue Yang, Ya-Wen Fu, Juan-Juan Zhao, Feng Zhang, Si-Ang Li, Mei Zhao, Wei Wen, Lei Zhang, Tao Cheng, Jian-Ping Zhang, Xiao-Bing Zhang
A series of clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR associated protein 9 (Cas9) systems have been engineered for genome editing. The most widely used Cas9 is SpCas9 from Streptococcus pyogenes and SaCas9 from Staphylococcus aureus. However, a comparison of their detailed gene editing outcomes is still lacking. By characterizing the editing outcomes of 11 sites in human induced pluripotent stem cells (iPSCs) and K562 cells, we found that SaCas9 could edit the genome with greater efficiencies than SpCas9. We also compared the effects of spacer lengths of single-guide RNAs (sgRNAs; 18-21 nt for SpCas9 and 19-23 nt for SaCas9) and found that the optimal spacer lengths were 20 nt and 21 nt for SpCas9 and SaCas9, respectively. However, the optimal spacer length for a particular sgRNA was 18-21 nt for SpCas9 and 21-22 nt for SaCas9. Furthermore, SpCas9 exhibited a more substantial bias than SaCas9 for nonhomologous end-joining (NHEJ) +1 insertion at the fourth nucleotide upstream of the protospacer adjacent motif (PAM), indicating a characteristic of a staggered cut. Accordingly, editing with SaCas9 led to higher efficiencies of NHEJ-mediated double-stranded oligodeoxynucleotide (dsODN) insertion or homology-directed repair (HDR)-mediated adeno-associated virus serotype 6 (AAV6) donor knock-in. Finally, GUIDE-seq analysis revealed that SaCas9 exhibited significantly reduced off-target effects compared with SpCas9. Our work indicates the superior performance of SaCas9 to SpCas9 in transgene integration-based therapeutic gene editing and the necessity to identify the optimal spacer length to achieve desired editing results.
{"title":"Superior Fidelity and Distinct Editing Outcomes of SaCas9 Compared with SpCas9 in Genome Editing.","authors":"Zhi-Xue Yang, Ya-Wen Fu, Juan-Juan Zhao, Feng Zhang, Si-Ang Li, Mei Zhao, Wei Wen, Lei Zhang, Tao Cheng, Jian-Ping Zhang, Xiao-Bing Zhang","doi":"10.1016/j.gpb.2022.12.003","DOIUrl":"10.1016/j.gpb.2022.12.003","url":null,"abstract":"<p><p>A series of clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR associated protein 9 (Cas9) systems have been engineered for genome editing. The most widely used Cas9 is SpCas9 from Streptococcus pyogenes and SaCas9 from Staphylococcus aureus. However, a comparison of their detailed gene editing outcomes is still lacking. By characterizing the editing outcomes of 11 sites in human induced pluripotent stem cells (iPSCs) and K562 cells, we found that SaCas9 could edit the genome with greater efficiencies than SpCas9. We also compared the effects of spacer lengths of single-guide RNAs (sgRNAs; 18-21 nt for SpCas9 and 19-23 nt for SaCas9) and found that the optimal spacer lengths were 20 nt and 21 nt for SpCas9 and SaCas9, respectively. However, the optimal spacer length for a particular sgRNA was 18-21 nt for SpCas9 and 21-22 nt for SaCas9. Furthermore, SpCas9 exhibited a more substantial bias than SaCas9 for nonhomologous end-joining (NHEJ) +1 insertion at the fourth nucleotide upstream of the protospacer adjacent motif (PAM), indicating a characteristic of a staggered cut. Accordingly, editing with SaCas9 led to higher efficiencies of NHEJ-mediated double-stranded oligodeoxynucleotide (dsODN) insertion or homology-directed repair (HDR)-mediated adeno-associated virus serotype 6 (AAV6) donor knock-in. Finally, GUIDE-seq analysis revealed that SaCas9 exhibited significantly reduced off-target effects compared with SpCas9. Our work indicates the superior performance of SaCas9 to SpCas9 in transgene integration-based therapeutic gene editing and the necessity to identify the optimal spacer length to achieve desired editing results.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1206-1220"},"PeriodicalIF":11.5,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10419418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01Epub Date: 2023-04-20DOI: 10.1016/j.gpb.2023.04.002
Ann-Yae Na, Hyojin Lee, Eun Ki Min, Sanjita Paudel, So Young Choi, HyunChae Sim, Kwang-Hyeon Liu, Ki-Tae Kim, Jong-Sup Bae, Sangkyu Lee
The recently developed technologies that allow the analysis of each single omics have provided an unbiased insight into ongoing disease processes. However, it remains challenging to specify the study design for the subsequent integration strategies that can associate sepsis pathophysiology and clinical outcomes. Here, we conducted a time-dependent multi-omics integration (TDMI) in a sepsis-associated liver dysfunction (SALD) model. We successfully deduced the relation of the Toll-like receptor 4 (TLR4) pathway with SALD. Although TLR4 is a critical factor in sepsis progression, it is not specified in single-omics analyses but only in the TDMI analysis. This finding indicates that the TDMI-based approach is more advantageous than single-omics analyses in terms of exploring the underlying pathophysiological mechanism of SALD. Furthermore, TDMI-based approach can be an ideal paradigm for insightful biological interpretations of multi-omics datasets that will potentially reveal novel insights into basic biology, health, and diseases, thus allowing the identification of promising candidates for therapeutic strategies.
{"title":"Novel Time-dependent Multi-omics Integration in Sepsis-associated Liver Dysfunction.","authors":"Ann-Yae Na, Hyojin Lee, Eun Ki Min, Sanjita Paudel, So Young Choi, HyunChae Sim, Kwang-Hyeon Liu, Ki-Tae Kim, Jong-Sup Bae, Sangkyu Lee","doi":"10.1016/j.gpb.2023.04.002","DOIUrl":"10.1016/j.gpb.2023.04.002","url":null,"abstract":"<p><p>The recently developed technologies that allow the analysis of each single omics have provided an unbiased insight into ongoing disease processes. However, it remains challenging to specify the study design for the subsequent integration strategies that can associate sepsis pathophysiology and clinical outcomes. Here, we conducted a time-dependent multi-omics integration (TDMI) in a sepsis-associated liver dysfunction (SALD) model. We successfully deduced the relation of the Toll-like receptor 4 (TLR4) pathway with SALD. Although TLR4 is a critical factor in sepsis progression, it is not specified in single-omics analyses but only in the TDMI analysis. This finding indicates that the TDMI-based approach is more advantageous than single-omics analyses in terms of exploring the underlying pathophysiological mechanism of SALD. Furthermore, TDMI-based approach can be an ideal paradigm for insightful biological interpretations of multi-omics datasets that will potentially reveal novel insights into basic biology, health, and diseases, thus allowing the identification of promising candidates for therapeutic strategies.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1101-1116"},"PeriodicalIF":11.5,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9422024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-04-17DOI: 10.1016/j.gpb.2023.03.001
Lin-Fang Ju, Heng-Ji Xu, Yun-Gui Yang, Ying Yang
During mammalian preimplantation development, a totipotent zygote undergoes several cell cleavages and two rounds of cell fate determination, ultimately forming a mature blastocyst. Along with compaction, the establishment of apicobasal cell polarity breaks the symmetry of an embryo and guides subsequent cell fate choice. Although the lineage segregation of the inner cell mass (ICM) and trophectoderm (TE) is the first symbol of cell differentiation, several molecules have been shown to bias the early cell fate through their inter-cellular variations at much earlier stages, including the 2- and 4-cell stages. The underlying mechanisms of early cell fate determination have long been an important research topic. In this review, we summarize the molecular events that occur during early embryogenesis, as well as the current understanding of their regulatory roles in cell fate decisions. Moreover, as powerful tools for early embryogenesis research, single-cell omics techniques have been applied to both mouse and human preimplantation embryos and have contributed to the discovery of cell fate regulators. Here, we summarize their applications in the research of preimplantation embryos, and provide new insights and perspectives on cell fate regulation.
{"title":"Omics Views of Mechanisms for Cell Fate Determination in Early Mammalian Development.","authors":"Lin-Fang Ju, Heng-Ji Xu, Yun-Gui Yang, Ying Yang","doi":"10.1016/j.gpb.2023.03.001","DOIUrl":"10.1016/j.gpb.2023.03.001","url":null,"abstract":"<p><p>During mammalian preimplantation development, a totipotent zygote undergoes several cell cleavages and two rounds of cell fate determination, ultimately forming a mature blastocyst. Along with compaction, the establishment of apicobasal cell polarity breaks the symmetry of an embryo and guides subsequent cell fate choice. Although the lineage segregation of the inner cell mass (ICM) and trophectoderm (TE) is the first symbol of cell differentiation, several molecules have been shown to bias the early cell fate through their inter-cellular variations at much earlier stages, including the 2- and 4-cell stages. The underlying mechanisms of early cell fate determination have long been an important research topic. In this review, we summarize the molecular events that occur during early embryogenesis, as well as the current understanding of their regulatory roles in cell fate decisions. Moreover, as powerful tools for early embryogenesis research, single-cell omics techniques have been applied to both mouse and human preimplantation embryos and have contributed to the discovery of cell fate regulators. Here, we summarize their applications in the research of preimplantation embryos, and provide new insights and perspectives on cell fate regulation.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"950-961"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10101436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-06-22DOI: 10.1016/j.gpb.2023.06.001
Yaojun Wang, Shiwei Sun
{"title":"Revolutionizing Antibody Discovery: An Innovative AI Model for Generating Robust Libraries.","authors":"Yaojun Wang, Shiwei Sun","doi":"10.1016/j.gpb.2023.06.001","DOIUrl":"10.1016/j.gpb.2023.06.001","url":null,"abstract":"","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"910-912"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9671806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2023-02-14DOI: 10.1016/j.gpb.2023.02.004
Wenbin Li, Lin Gao, Xin Yi, Shuangfeng Shi, Jie Huang, Leming Shi, Xiaoyan Zhou, Lingying Wu, Jianming Ying
Defects in genes involved in the DNA damage response cause homologous recombination repair deficiency (HRD). HRD is found in a subgroup of cancer patients for several tumor types, and it has a clinical relevance to cancer prevention and therapies. Accumulating evidence has identified HRD as a biomarker for assessing the therapeutic response of tumor cells to poly(ADP-ribose) polymerase inhibitors and platinum-based chemotherapies. Nevertheless, the biology of HRD is complex, and its applications and the benefits of different HRD biomarker assays are controversial. This is primarily due to inconsistencies in HRD assessments and definitions (gene-level tests, genomic scars, mutational signatures, or a combination of these methods) and difficulties in assessing the contribution of each genomic event. Therefore, we aim to review the biological rationale and clinical evidence of HRD as a biomarker. This review provides a blueprint for the standardization and harmonization of HRD assessments.
{"title":"Patient Assessment and Therapy Planning Based on Homologous Recombination Repair Deficiency.","authors":"Wenbin Li, Lin Gao, Xin Yi, Shuangfeng Shi, Jie Huang, Leming Shi, Xiaoyan Zhou, Lingying Wu, Jianming Ying","doi":"10.1016/j.gpb.2023.02.004","DOIUrl":"10.1016/j.gpb.2023.02.004","url":null,"abstract":"<p><p>Defects in genes involved in the DNA damage response cause homologous recombination repair deficiency (HRD). HRD is found in a subgroup of cancer patients for several tumor types, and it has a clinical relevance to cancer prevention and therapies. Accumulating evidence has identified HRD as a biomarker for assessing the therapeutic response of tumor cells to poly(ADP-ribose) polymerase inhibitors and platinum-based chemotherapies. Nevertheless, the biology of HRD is complex, and its applications and the benefits of different HRD biomarker assays are controversial. This is primarily due to inconsistencies in HRD assessments and definitions (gene-level tests, genomic scars, mutational signatures, or a combination of these methods) and difficulties in assessing the contribution of each genomic event. Therefore, we aim to review the biological rationale and clinical evidence of HRD as a biomarker. This review provides a blueprint for the standardization and harmonization of HRD assessments.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"962-975"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10737665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).
{"title":"AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning.","authors":"Xiaopeng Xu, Tiantian Xu, Juexiao Zhou, Xingyu Liao, Ruochi Zhang, Yu Wang, Lu Zhang, Xin Gao","doi":"10.1016/j.gpb.2023.03.004","DOIUrl":"10.1016/j.gpb.2023.03.004","url":null,"abstract":"<p><p>Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1043-1053"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10045398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
蛋白质结构预测是一个跨学科研究课题,吸引了来自生物化学、医学、物理学、数学和计算机科学等多个领域的研究人员。这些研究人员采用不同的研究范式来解决相同的结构预测问题:生物化学家和物理学家试图揭示蛋白质折叠的原理;数学家,尤其是统计学家,通常从假设目标序列中蛋白质结构的概率分布出发,然后找出最可能的结构;而计算机科学家则将蛋白质结构预测表述为一个优化问题--寻找能量最低的结构构象,或将预测结构与原生结构之间的差异最小化。这些研究范式属于 L. Breiman 提出的两种统计建模文化,即数据建模和算法建模。最近,我们也见证了深度学习在蛋白质结构预测方面的巨大成功。在这篇综述中,我们对蛋白质结构预测方面的工作进行了调查。我们比较了不同领域研究人员所采用的研究范式,重点关注深度学习时代研究范式的转变。总之,算法建模技术,尤其是深度神经网络,大大提高了蛋白质结构预测的准确性;然而,解释神经网络的理论和蛋白质折叠方面的知识仍是亟待解决的问题。
{"title":"Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms.","authors":"Bin Huang, Lupeng Kong, Chao Wang, Fusong Ju, Qi Zhang, Jianwei Zhu, Tiansu Gong, Haicang Zhang, Chungong Yu, Wei-Mou Zheng, Dongbo Bu","doi":"10.1016/j.gpb.2022.11.014","DOIUrl":"10.1016/j.gpb.2022.11.014","url":null,"abstract":"<p><p>Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"913-925"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928435/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9593946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01Epub Date: 2022-12-23DOI: 10.1016/j.gpb.2022.12.004
Lina Ma, Dong Zou, Lin Liu, Huma Shireen, Amir A Abbasi, Alex Bateman, Jingfa Xiao, Wenming Zhao, Yiming Bao, Zhang Zhang
Biological databases serve as a global fundamental infrastructure for the worldwide scientific community, which dramatically aid the transformation of big data into knowledge discovery and drive significant innovations in a wide range of research fields. Given the rapid data production, biological databases continue to increase in size and importance. To build a catalog of worldwide biological databases, we curate a total of 5825 biological databases from 8931 publications, which are geographically distributed in 72 countries/regions and developed by 1975 institutions (as of September 20, 2022). We further devise a z-index, a novel index to characterize the scientific impact of a database, and rank all these biological databases as well as their hosting institutions and countries in terms of citation and z-index. Consequently, we present a series of statistics and trends of worldwide biological databases, yielding a global perspective to better understand their status and impact for life and health sciences. An up-to-date catalog of worldwide biological databases, as well as their curated meta-information and derived statistics, is publicly available at Database Commons (https://ngdc.cncb.ac.cn/databasecommons/).
{"title":"Database Commons: A Catalog of Worldwide Biological Databases.","authors":"Lina Ma, Dong Zou, Lin Liu, Huma Shireen, Amir A Abbasi, Alex Bateman, Jingfa Xiao, Wenming Zhao, Yiming Bao, Zhang Zhang","doi":"10.1016/j.gpb.2022.12.004","DOIUrl":"10.1016/j.gpb.2022.12.004","url":null,"abstract":"<p><p>Biological databases serve as a global fundamental infrastructure for the worldwide scientific community, which dramatically aid the transformation of big data into knowledge discovery and drive significant innovations in a wide range of research fields. Given the rapid data production, biological databases continue to increase in size and importance. To build a catalog of worldwide biological databases, we curate a total of 5825 biological databases from 8931 publications, which are geographically distributed in 72 countries/regions and developed by 1975 institutions (as of September 20, 2022). We further devise a z-index, a novel index to characterize the scientific impact of a database, and rank all these biological databases as well as their hosting institutions and countries in terms of citation and z-index. Consequently, we present a series of statistics and trends of worldwide biological databases, yielding a global perspective to better understand their status and impact for life and health sciences. An up-to-date catalog of worldwide biological databases, as well as their curated meta-information and derived statistics, is publicly available at Database Commons (https://ngdc.cncb.ac.cn/databasecommons/).</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1054-1058"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10787370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the persistent coronavirus disease 2019 (COVID-19) pandemic, which has resulted in millions of deaths worldwide and brought an enormous public health and global economic burden. The recurring global wave of infections has been exacerbated by growing variants of SARS-CoV-2. In this study, the virological characteristics of the original SARS-CoV-2 strain and its variants of concern (VOCs; including Alpha, Beta, and Delta) in vitro, as well as differential transcriptomic landscapes in multiple organs (lung, right ventricle, blood, cerebral cortex, and cerebellum) from the infected rhesus macaques, were elucidated. The original strain of SARS-CoV-2 caused a stronger innate immune response in host cells, and its VOCs markedly increased the levels of subgenomic RNAs, such as N, Orf9b, Orf6, and Orf7ab, which are known as the innate immune antagonists and the inhibitors of antiviral factors. Intriguingly, the original SARS-CoV-2 strain and Alpha variant induced larger alteration of RNA abundance in tissues of rhesus monkeys than Beta and Delta variants did. Moreover, a hyperinflammatory state and active immune response were shown in the right ventricles of rhesus monkeys by the up-regulation of inflammation- and immune-related RNAs. Furthermore, peripheral blood may mediate signaling transmission among tissues to coordinate the molecular changes in the infected individuals. Collectively, these data provide insights into the pathogenesis of COVID-19 at the early stage of infection by the original SARS-CoV-2 strain and its VOCs.
{"title":"Differential Transcriptomic Landscapes of SARS-CoV-2 Variants in Multiple Organs from Infected Rhesus Macaques.","authors":"Tingfu Du, Chunchun Gao, Shuaiyao Lu, Qianlan Liu, Yun Yang, Wenhai Yu, Wenjie Li, Yong Qiao Sun, Cong Tang, Junbin Wang, Jiahong Gao, Yong Zhang, Fangyu Luo, Ying Yang, Yun-Gui Yang, Xiaozhong Peng","doi":"10.1016/j.gpb.2023.06.002","DOIUrl":"10.1016/j.gpb.2023.06.002","url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the persistent coronavirus disease 2019 (COVID-19) pandemic, which has resulted in millions of deaths worldwide and brought an enormous public health and global economic burden. The recurring global wave of infections has been exacerbated by growing variants of SARS-CoV-2. In this study, the virological characteristics of the original SARS-CoV-2 strain and its variants of concern (VOCs; including Alpha, Beta, and Delta) in vitro, as well as differential transcriptomic landscapes in multiple organs (lung, right ventricle, blood, cerebral cortex, and cerebellum) from the infected rhesus macaques, were elucidated. The original strain of SARS-CoV-2 caused a stronger innate immune response in host cells, and its VOCs markedly increased the levels of subgenomic RNAs, such as N, Orf9b, Orf6, and Orf7ab, which are known as the innate immune antagonists and the inhibitors of antiviral factors. Intriguingly, the original SARS-CoV-2 strain and Alpha variant induced larger alteration of RNA abundance in tissues of rhesus monkeys than Beta and Delta variants did. Moreover, a hyperinflammatory state and active immune response were shown in the right ventricles of rhesus monkeys by the up-regulation of inflammation- and immune-related RNAs. Furthermore, peripheral blood may mediate signaling transmission among tissues to coordinate the molecular changes in the infected individuals. Collectively, these data provide insights into the pathogenesis of COVID-19 at the early stage of infection by the original SARS-CoV-2 strain and its VOCs.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"1014-1029"},"PeriodicalIF":11.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10154985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01Epub Date: 2023-02-11DOI: 10.1016/j.gpb.2023.02.002
Feng Yu, Huanhuan Qi, Li Gao, Sen Luo, Rebecca Njeri Damaris, Yinggen Ke, Wenhua Wu, Pingfang Yang
Transcriptome analysis based on high-throughput sequencing of a cDNA library has been widely applied to functional genomic studies. However, the cDNA dependence of most RNA sequencing techniques constrains their ability to detect base modifications on RNA, which is an important element for the post-transcriptional regulation of gene expression. To comprehensively profile the N6-methyladenosine (m6A) and N5-methylcytosine (m5C) modifications on RNA, direct RNA sequencing (DRS) using the latest Oxford Nanopore Technology was applied to analyze the transcriptome of six tissues in rice. Approximately 94 million reads were generated, with an average length ranging from 619 nt to 1013 nt, and a total of 45,707 transcripts across 34,763 genes were detected. Expression profiles of transcripts at the isoform level were quantified among tissues. Transcriptome-wide mapping of m6A and m5C demonstrated that both modifications exhibited tissue-specific characteristics. The transcripts with m6A modifications tended to be modified by m5C, and the transcripts with modifications presented higher expression levels along with shorter poly(A) tails than transcripts without modifications, suggesting the complexity of gene expression regulation. Gene Ontology analysis demonstrated that m6A- and m5C-modified transcripts were involved in central metabolic pathways related to the life cycle, with modifications on the target genes selected in a tissue-specific manner. Furthermore, most modified sites were located within quantitative trait loci that control important agronomic traits, highlighting the value of cloning functional loci. The results provide new insights into the expression regulation complexity and data resource of the transcriptome and epitranscriptome, improving our understanding of the rice genome.
{"title":"Identifying RNA Modifications by Direct RNA Sequencing Reveals Complexity of Epitranscriptomic Dynamics in Rice.","authors":"Feng Yu, Huanhuan Qi, Li Gao, Sen Luo, Rebecca Njeri Damaris, Yinggen Ke, Wenhua Wu, Pingfang Yang","doi":"10.1016/j.gpb.2023.02.002","DOIUrl":"10.1016/j.gpb.2023.02.002","url":null,"abstract":"<p><p>Transcriptome analysis based on high-throughput sequencing of a cDNA library has been widely applied to functional genomic studies. However, the cDNA dependence of most RNA sequencing techniques constrains their ability to detect base modifications on RNA, which is an important element for the post-transcriptional regulation of gene expression. To comprehensively profile the N<sup>6</sup>-methyladenosine (m<sup>6</sup>A) and N<sup>5</sup>-methylcytosine (m<sup>5</sup>C) modifications on RNA, direct RNA sequencing (DRS) using the latest Oxford Nanopore Technology was applied to analyze the transcriptome of six tissues in rice. Approximately 94 million reads were generated, with an average length ranging from 619 nt to 1013 nt, and a total of 45,707 transcripts across 34,763 genes were detected. Expression profiles of transcripts at the isoform level were quantified among tissues. Transcriptome-wide mapping of m<sup>6</sup>A and m<sup>5</sup>C demonstrated that both modifications exhibited tissue-specific characteristics. The transcripts with m<sup>6</sup>A modifications tended to be modified by m<sup>5</sup>C, and the transcripts with modifications presented higher expression levels along with shorter poly(A) tails than transcripts without modifications, suggesting the complexity of gene expression regulation. Gene Ontology analysis demonstrated that m<sup>6</sup>A- and m<sup>5</sup>C-modified transcripts were involved in central metabolic pathways related to the life cycle, with modifications on the target genes selected in a tissue-specific manner. Furthermore, most modified sites were located within quantitative trait loci that control important agronomic traits, highlighting the value of cloning functional loci. The results provide new insights into the expression regulation complexity and data resource of the transcriptome and epitranscriptome, improving our understanding of the rice genome.</p>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":" ","pages":"788-804"},"PeriodicalIF":11.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10695747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}