Zijie Jiang, Zhixiang Peng, Zhaoyuan Wei, Jiahe Sun, Yongjiang Luo, Lingzi Bie, Guoqing Zhang, Yi Wang
The application of high-throughput chromosome conformation capture (Hi-C) technology enables the construction of chromosome-level assemblies. However, the correction of errors and the anchoring of sequences to chromosomes in the assembly remain significant challenges. In this study, we developed a deep learning-based method, AutoHiC, to address the challenges in chromosome-level genome assembly by enhancing contiguity and accuracy. Conventional Hi-C-aided scaffolding often requires manual refinement, but AutoHiC instead utilizes Hi-C data for automated workflows and iterative error correction. When trained on data from 300+ species, AutoHiC demonstrated a robust average error detection accuracy exceeding 90%. The benchmarking results confirmed its significant impact on genome contiguity and error correction. The innovative approach and comprehensive results of AutoHiC constitute a breakthrough in automated error detection, promising more accurate genome assemblies for advancing genomics research.
{"title":"A deep learning-based method enables the automatic and accurate assembly of chromosome-level genomes","authors":"Zijie Jiang, Zhixiang Peng, Zhaoyuan Wei, Jiahe Sun, Yongjiang Luo, Lingzi Bie, Guoqing Zhang, Yi Wang","doi":"10.1093/nar/gkae789","DOIUrl":"https://doi.org/10.1093/nar/gkae789","url":null,"abstract":"The application of high-throughput chromosome conformation capture (Hi-C) technology enables the construction of chromosome-level assemblies. However, the correction of errors and the anchoring of sequences to chromosomes in the assembly remain significant challenges. In this study, we developed a deep learning-based method, AutoHiC, to address the challenges in chromosome-level genome assembly by enhancing contiguity and accuracy. Conventional Hi-C-aided scaffolding often requires manual refinement, but AutoHiC instead utilizes Hi-C data for automated workflows and iterative error correction. When trained on data from 300+ species, AutoHiC demonstrated a robust average error detection accuracy exceeding 90%. The benchmarking results confirmed its significant impact on genome contiguity and error correction. The innovative approach and comprehensive results of AutoHiC constitute a breakthrough in automated error detection, promising more accurate genome assemblies for advancing genomics research.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most DNA scanning proteins uniquely recognize their cognate sequence motif and slide on DNA assisted by some sort of clamping interface. The pioneer transcription factors that control cell fate in eukaryotes must forgo both elements to gain access to DNA in naked and chromatin forms; thus, whether or how these factors scan naked DNA is unknown. Here, we use single-molecule techniques to investigate naked DNA scanning by the Engrailed homeodomain (enHD) as paradigm of highly promiscuous recognition and open DNA binding interface. We find that enHD scans naked DNA quite effectively, and about 200000-fold faster than expected for a continuous promiscuous slide. To do so, enHD scans about 675 bp of DNA in 100 ms and then redeploys stochastically to another location 530 bp afar in just 10 ms. During the scanning phase enHD alternates between slow- and medium-paced modes every 3 and 40 ms, respectively. We also find that enHD binds nucleosomes and does so with enhanced affinity relative to naked DNA. Our results demonstrate that pioneer-like transcription factors can in principle do both, target nucleosomes and scan active DNA efficiently. The hybrid scanning mechanism used by enHD appears particularly well suited for the highly complex genomic signals of eukaryotic cells.
大多数 DNA 扫描蛋白都能独特地识别它们的同源序列图案,并在某种夹持界面的辅助下在 DNA 上滑动。在真核生物中,控制细胞命运的先驱转录因子必须放弃这两种元素,才能获得裸DNA和染色质形式的DNA;因此,这些因子是否或如何扫描裸DNA尚不清楚。在这里,我们利用单分子技术研究了裸DNA扫描的啮合同源结构域(enHD),它是高度杂乱识别和开放式DNA结合界面的典范。我们发现,enHD 扫描裸 DNA 的效率相当高,比预期的连续杂交滑动快约 20 万倍。为此,enHD 在 100 毫秒内扫描约 675 bp 的 DNA,然后在 10 毫秒内随机重新部署到 530 bp 之外的另一个位置。在扫描阶段,enHD 分别每 3 毫秒和 40 毫秒交替使用慢速和中速模式。我们还发现,enHD 能与核小体结合,而且与裸 DNA 相比亲和力更强。我们的研究结果表明,类先驱转录因子原则上可以同时针对核小体和有效扫描活性 DNA。enHD使用的混合扫描机制似乎特别适合真核细胞高度复杂的基因组信号。
{"title":"How to scan naked DNA using promiscuous recognition and no clamping: a model for pioneer transcription factors","authors":"Rama Reddy Goluguri, Catherine Ghosh, Joshua Quintong, Mourad Sadqi, Victor Muñoz","doi":"10.1093/nar/gkae790","DOIUrl":"https://doi.org/10.1093/nar/gkae790","url":null,"abstract":"Most DNA scanning proteins uniquely recognize their cognate sequence motif and slide on DNA assisted by some sort of clamping interface. The pioneer transcription factors that control cell fate in eukaryotes must forgo both elements to gain access to DNA in naked and chromatin forms; thus, whether or how these factors scan naked DNA is unknown. Here, we use single-molecule techniques to investigate naked DNA scanning by the Engrailed homeodomain (enHD) as paradigm of highly promiscuous recognition and open DNA binding interface. We find that enHD scans naked DNA quite effectively, and about 200000-fold faster than expected for a continuous promiscuous slide. To do so, enHD scans about 675 bp of DNA in 100 ms and then redeploys stochastically to another location 530 bp afar in just 10 ms. During the scanning phase enHD alternates between slow- and medium-paced modes every 3 and 40 ms, respectively. We also find that enHD binds nucleosomes and does so with enhanced affinity relative to naked DNA. Our results demonstrate that pioneer-like transcription factors can in principle do both, target nucleosomes and scan active DNA efficiently. The hybrid scanning mechanism used by enHD appears particularly well suited for the highly complex genomic signals of eukaryotic cells.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microbial communities usually harbor a mix of bacteria, archaea, plasmids, viruses and microeukaryotes. Within these communities, viruses, plasmids, and microeukaryotes coexist in relatively low abundance, yet they engage in intricate interactions with bacteria. Moreover, viruses and plasmids, as mobile genetic elements, play important roles in horizontal gene transfer and the development of antibiotic resistance within microbial populations. However, due to the difficulty of identifying viruses, plasmids, and microeukaryotes in microbial communities, our understanding of these minor classes lags behind that of bacteria and archaea. Recently, several classifiers have been developed to separate one or more minor classes from bacteria and archaea in metagenome assemblies. However, these classifiers often overlook the issue of class imbalance, leading to low precision in identifying the minor classes. Here, we developed a classifier called 4CAC that is able to identify viruses, plasmids, microeukaryotes, and prokaryotes simultaneously from metagenome assemblies. 4CAC generates an initial four-way classification using several sequence length-adjusted XGBoost models and further improves the classification using the assembly graph. Evaluation on simulated and real metagenome datasets demonstrates that 4CAC substantially outperforms existing classifiers and combinations thereof on short reads. On long reads, it also shows an advantage unless the abundance of the minor classes is very low. 4CAC runs 1–2 orders of magnitude faster than the other classifiers. The 4CAC software is available at https://github.com/Shamir-Lab/4CAC.
{"title":"4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs","authors":"Lianrong Pu, Ron Shamir","doi":"10.1093/nar/gkae799","DOIUrl":"https://doi.org/10.1093/nar/gkae799","url":null,"abstract":"Microbial communities usually harbor a mix of bacteria, archaea, plasmids, viruses and microeukaryotes. Within these communities, viruses, plasmids, and microeukaryotes coexist in relatively low abundance, yet they engage in intricate interactions with bacteria. Moreover, viruses and plasmids, as mobile genetic elements, play important roles in horizontal gene transfer and the development of antibiotic resistance within microbial populations. However, due to the difficulty of identifying viruses, plasmids, and microeukaryotes in microbial communities, our understanding of these minor classes lags behind that of bacteria and archaea. Recently, several classifiers have been developed to separate one or more minor classes from bacteria and archaea in metagenome assemblies. However, these classifiers often overlook the issue of class imbalance, leading to low precision in identifying the minor classes. Here, we developed a classifier called 4CAC that is able to identify viruses, plasmids, microeukaryotes, and prokaryotes simultaneously from metagenome assemblies. 4CAC generates an initial four-way classification using several sequence length-adjusted XGBoost models and further improves the classification using the assembly graph. Evaluation on simulated and real metagenome datasets demonstrates that 4CAC substantially outperforms existing classifiers and combinations thereof on short reads. On long reads, it also shows an advantage unless the abundance of the minor classes is very low. 4CAC runs 1–2 orders of magnitude faster than the other classifiers. The 4CAC software is available at https://github.com/Shamir-Lab/4CAC.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cytoplasmic polyhedrosis viruses (CPVs), like other members of the order Reovirales, produce viroplasms, hubs of viral assembly that shield them from host immunity. Our study investigates the potential role of NSP9, a nucleic acid-binding non-structural protein encoded by CPVs, in viroplasm biogenesis. We determined the crystal structure of the NSP9 core (NSP9ΔC), which shows a dimeric organization topologically similar to the P9-1 homodimers of plant reoviruses. The disordered C-terminal region of NSP9 facilitates oligomerization but is dispensable for nucleic acid binding. NSP9 robustly binds to single- and double-stranded nucleic acids, regardless of RNA or DNA origin. Mutagenesis studies further confirmed that the dimeric form of NSP9 is critical for nucleic acid binding due to positively charged residues that form a tunnel during homodimerization. Gel migration assays reveal a unique nucleic acid binding pattern, with the sequential appearance of two distinct complexes dependent on protein concentration. The similar gel migration pattern shared by NSP9 and rotavirus NSP3, coupled with its structural resemblance to P9-1, hints at a potential role in translational regulation or viral genome packaging, which may be linked to viroplasm. This study advances our understanding of viroplasm biogenesis and Reovirales replication, providing insights into potential antiviral drug targets.
{"title":"Crystal structure and nucleic acid binding mode of CPV NSP9: implications for viroplasm in Reovirales","authors":"Yeda Wang, Hangtian Guo, Yuhao Lu, Wanbin Yang, Tinghan Li, Xiaoyun Ji","doi":"10.1093/nar/gkae803","DOIUrl":"https://doi.org/10.1093/nar/gkae803","url":null,"abstract":"Cytoplasmic polyhedrosis viruses (CPVs), like other members of the order Reovirales, produce viroplasms, hubs of viral assembly that shield them from host immunity. Our study investigates the potential role of NSP9, a nucleic acid-binding non-structural protein encoded by CPVs, in viroplasm biogenesis. We determined the crystal structure of the NSP9 core (NSP9ΔC), which shows a dimeric organization topologically similar to the P9-1 homodimers of plant reoviruses. The disordered C-terminal region of NSP9 facilitates oligomerization but is dispensable for nucleic acid binding. NSP9 robustly binds to single- and double-stranded nucleic acids, regardless of RNA or DNA origin. Mutagenesis studies further confirmed that the dimeric form of NSP9 is critical for nucleic acid binding due to positively charged residues that form a tunnel during homodimerization. Gel migration assays reveal a unique nucleic acid binding pattern, with the sequential appearance of two distinct complexes dependent on protein concentration. The similar gel migration pattern shared by NSP9 and rotavirus NSP3, coupled with its structural resemblance to P9-1, hints at a potential role in translational regulation or viral genome packaging, which may be linked to viroplasm. This study advances our understanding of viroplasm biogenesis and Reovirales replication, providing insights into potential antiviral drug targets.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun Yang, Pratik Basnet, Samah Sharmin, Hui Shen, Craig D Kaplan, Kenji Murakami
RNA polymerase II (pol II) initiates transcription from transcription start sites (TSSs) located ∼30–35 bp downstream of the TATA box in metazoans, whereas in the yeast Saccharomyces cerevisiae, pol II scans further downstream TSSs located ∼40–120 bp downstream of the TATA box. Previously, we found that removal of the kinase module TFIIK (Kin28–Ccl1–Tfb3) from TFIIH shifts the TSS in a yeast in vitro system upstream to the location observed in metazoans and that addition of recombinant Tfb3 back to TFIIH-ΔTFIIK restores the downstream TSS usage. Here, we report that this biochemical activity of yeast TFIIK in TSS scanning is attributable to the Tfb3 RING domain at the interface with pol II in the pre-initiation complex (PIC): especially, swapping Tfb3 Pro51—a residue conserved among all fungi—with Ala or Ser as in MAT1, the metazoan homolog of Tfb3, confers an upstream TSS shift in vitro in a similar manner to the removal of TFIIK. Yeast genetic analysis suggests that both Pro51 and Arg64 of Tfb3 are required to maintain the stability of the Tfb3–pol II interface in the PIC. Cryo-electron microscopy analysis of a yeast PIC lacking TFIIK reveals considerable variability in the orientation of TFIIH, which impairs TSS scanning after promoter opening.
{"title":"Transcription start site scanning requires the fungi-specific hydrophobic loop of Tfb3","authors":"Chun Yang, Pratik Basnet, Samah Sharmin, Hui Shen, Craig D Kaplan, Kenji Murakami","doi":"10.1093/nar/gkae805","DOIUrl":"https://doi.org/10.1093/nar/gkae805","url":null,"abstract":"RNA polymerase II (pol II) initiates transcription from transcription start sites (TSSs) located ∼30–35 bp downstream of the TATA box in metazoans, whereas in the yeast Saccharomyces cerevisiae, pol II scans further downstream TSSs located ∼40–120 bp downstream of the TATA box. Previously, we found that removal of the kinase module TFIIK (Kin28–Ccl1–Tfb3) from TFIIH shifts the TSS in a yeast in vitro system upstream to the location observed in metazoans and that addition of recombinant Tfb3 back to TFIIH-ΔTFIIK restores the downstream TSS usage. Here, we report that this biochemical activity of yeast TFIIK in TSS scanning is attributable to the Tfb3 RING domain at the interface with pol II in the pre-initiation complex (PIC): especially, swapping Tfb3 Pro51—a residue conserved among all fungi—with Ala or Ser as in MAT1, the metazoan homolog of Tfb3, confers an upstream TSS shift in vitro in a similar manner to the removal of TFIIK. Yeast genetic analysis suggests that both Pro51 and Arg64 of Tfb3 are required to maintain the stability of the Tfb3–pol II interface in the PIC. Cryo-electron microscopy analysis of a yeast PIC lacking TFIIK reveals considerable variability in the orientation of TFIIH, which impairs TSS scanning after promoter opening.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulia Nicoletto, Marianna Terreri, Ilaria Maurizio, Emanuela Ruggiero, Filippo M Cernilogar, Christine A Vaine, Maria Vittoria Cottini, Irina Shcherbakova, Ellen B Penney, Irene Gallina, David Monchaud, D Cristopher Bragg, Gunnar Schotta, Sara N Richter
G-quadruplexes (G4s) are non-canonical nucleic acid structures that form in guanine (G)-rich genomic regions. X-linked dystonia parkinsonism (XDP) is an inherited neurodegenerative disease in which a SINE–VNTR–Alu (SVA) retrotransposon, characterised by amplification of a G-rich repeat, is inserted into the coding sequence of TAF1, a key partner of RNA polymerase II. XDP SVA alters TAF1 expression, but the cause of this outcome in XDP remains unknown. To assess whether G4s form in XDP SVA and affect TAF1 expression, we first characterised bioinformatically predicted XDP SVA G4s in vitro. We next showed that highly stable G4s can form and stop polymerase amplification at the SVA region from patient-derived fibroblasts and neural progenitor cells. Using chromatin immunoprecipitazion (ChIP) with an anti-G4 antibody coupled to sequencing or quantitative PCR, we showed that XDP SVA G4s are folded even when embedded in a chromatin context in patient-derived cells. Using the G4 ligands BRACO-19 and quarfloxin and total RNA-sequencing analysis, we showed that stabilisation of the XDP SVA G4s reduces TAF1 transcripts downstream and around the SVA, and increases upstream transcripts, while destabilisation using the G4 unfolder PhpC increases TAF1 transcripts. Our data indicate that G4 formation in the XDP SVA is a major cause of aberrant TAF1 expression, opening the way for the development of strategies to unfold G4s and potentially target the disease.
G-quadruplexes (G4s) 是在富含鸟嘌呤 (G) 的基因组区域形成的非经典核酸结构。X-连锁肌张力障碍性帕金森病(XDP)是一种遗传性神经退行性疾病,在这种疾病中,一个SINE-VNTR-Alu(SVA)反转座子插入到RNA聚合酶II的关键伙伴TAF1的编码序列中,该反转座子的特点是富含G的重复扩增。XDP SVA改变了TAF1的表达,但在XDP中造成这种结果的原因仍不清楚。为了评估 G4 是否在 XDP SVA 中形成并影响 TAF1 的表达,我们首先对生物信息学预测的 XDP SVA G4 进行了体外鉴定。接下来,我们从患者来源的成纤维细胞和神经祖细胞中发现,高度稳定的 G4s 可以在 SVA 区域形成并阻止聚合酶扩增。通过使用抗 G4 抗体进行染色质免疫沉淀(ChIP),并结合测序或定量 PCR,我们发现 XDP SVA G4 即使嵌入患者来源细胞的染色质中也会折叠。通过使用 G4 配体 BRACO-19 和 quarfloxin 以及总 RNA 序列分析,我们发现稳定 XDP SVA G4s 会减少 SVA 下游和周围的 TAF1 转录本,并增加上游转录本,而使用 G4 解除older PhpC 破坏稳定则会增加 TAF1 转录本。我们的数据表明,XDP SVA 中 G4 的形成是 TAF1 表达异常的主要原因,这为开发 G4 的解稳策略和潜在的靶向疾病开辟了道路。
{"title":"G-quadruplexes in an SVA retrotransposon cause aberrant TAF1 gene expression in X-linked dystonia parkinsonism","authors":"Giulia Nicoletto, Marianna Terreri, Ilaria Maurizio, Emanuela Ruggiero, Filippo M Cernilogar, Christine A Vaine, Maria Vittoria Cottini, Irina Shcherbakova, Ellen B Penney, Irene Gallina, David Monchaud, D Cristopher Bragg, Gunnar Schotta, Sara N Richter","doi":"10.1093/nar/gkae797","DOIUrl":"https://doi.org/10.1093/nar/gkae797","url":null,"abstract":"G-quadruplexes (G4s) are non-canonical nucleic acid structures that form in guanine (G)-rich genomic regions. X-linked dystonia parkinsonism (XDP) is an inherited neurodegenerative disease in which a SINE–VNTR–Alu (SVA) retrotransposon, characterised by amplification of a G-rich repeat, is inserted into the coding sequence of TAF1, a key partner of RNA polymerase II. XDP SVA alters TAF1 expression, but the cause of this outcome in XDP remains unknown. To assess whether G4s form in XDP SVA and affect TAF1 expression, we first characterised bioinformatically predicted XDP SVA G4s in vitro. We next showed that highly stable G4s can form and stop polymerase amplification at the SVA region from patient-derived fibroblasts and neural progenitor cells. Using chromatin immunoprecipitazion (ChIP) with an anti-G4 antibody coupled to sequencing or quantitative PCR, we showed that XDP SVA G4s are folded even when embedded in a chromatin context in patient-derived cells. Using the G4 ligands BRACO-19 and quarfloxin and total RNA-sequencing analysis, we showed that stabilisation of the XDP SVA G4s reduces TAF1 transcripts downstream and around the SVA, and increases upstream transcripts, while destabilisation using the G4 unfolder PhpC increases TAF1 transcripts. Our data indicate that G4 formation in the XDP SVA is a major cause of aberrant TAF1 expression, opening the way for the development of strategies to unfold G4s and potentially target the disease.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ulli Rothweiler, Sigurd Eidem Gundesø, Emma Wu Mikalsen, Steingrim Svenning, Mahavir Singh, Francis Combes, Frida J Pettersson, Antonia Mangold, Yvonne Piotrowski, Felix Schwab, Olav Lanes, Bernd Ketelsen Striberny
Over the past five decades, DNA restriction enzymes have revolutionized biotechnology. While these enzymes are widely used in DNA research and DNA engineering, the emerging field of RNA and mRNA therapeutics requires sequence-specific RNA endoribonucleases. Here, we describe EcoToxN1, a member of the type III toxin-antitoxin family of sequence-specific RNA endoribonucleases, and its use in RNA and mRNA analysis. This enzyme recognizes a specific pentamer in a single-stranded RNA and cleaves the RNA within this sequence. The enzyme is neither dependent on annealing of guide RNA or DNA oligos to the template nor does it require magnesium. Furthermore, it performs over a wide range of temperatures. With its unique functions and characteristics, EcoToxN1 can be classified as an RNA restriction enzyme. EcoToxN1 enables new workflows in RNA analysis and biomanufacturing, meeting the demand for faster, cheaper, and more robust analysis methods.
过去五十年来,DNA 限制酶彻底改变了生物技术。这些酶被广泛应用于 DNA 研究和 DNA 工程,而新兴的 RNA 和 mRNA 治疗领域则需要序列特异的 RNA 内切核酸酶。在这里,我们介绍了序列特异性 RNA 内切核酸酶 III 型毒素-抗毒素家族的成员 EcoToxN1 及其在 RNA 和 mRNA 分析中的应用。这种酶能识别单链 RNA 中的特定五聚体,并在此序列内切割 RNA。这种酶既不依赖于引导 RNA 或 DNA 寡聚物与模板的退火,也不需要镁。此外,它的工作温度范围很广。凭借其独特的功能和特性,EcoToxN1 可被归类为一种 RNA 限制酶。EcoToxN1 为 RNA 分析和生物制造带来了新的工作流程,满足了对更快、更便宜、更可靠的分析方法的需求。
{"title":"Using nucleolytic toxins as restriction enzymes enables new RNA applications","authors":"Ulli Rothweiler, Sigurd Eidem Gundesø, Emma Wu Mikalsen, Steingrim Svenning, Mahavir Singh, Francis Combes, Frida J Pettersson, Antonia Mangold, Yvonne Piotrowski, Felix Schwab, Olav Lanes, Bernd Ketelsen Striberny","doi":"10.1093/nar/gkae779","DOIUrl":"https://doi.org/10.1093/nar/gkae779","url":null,"abstract":"Over the past five decades, DNA restriction enzymes have revolutionized biotechnology. While these enzymes are widely used in DNA research and DNA engineering, the emerging field of RNA and mRNA therapeutics requires sequence-specific RNA endoribonucleases. Here, we describe EcoToxN1, a member of the type III toxin-antitoxin family of sequence-specific RNA endoribonucleases, and its use in RNA and mRNA analysis. This enzyme recognizes a specific pentamer in a single-stranded RNA and cleaves the RNA within this sequence. The enzyme is neither dependent on annealing of guide RNA or DNA oligos to the template nor does it require magnesium. Furthermore, it performs over a wide range of temperatures. With its unique functions and characteristics, EcoToxN1 can be classified as an RNA restriction enzyme. EcoToxN1 enables new workflows in RNA analysis and biomanufacturing, meeting the demand for faster, cheaper, and more robust analysis methods.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142233309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Bi, Fuyi Li, Cong Wang, Tong Pan, Chen Davidovich, Geoffrey I Webb, Jiangning Song
MicroRNAs (miRNAs) are short non-coding RNAs involved in various cellular processes, playing a crucial role in gene regulation. Identifying miRNA targets remains a central challenge and is pivotal for elucidating the complex gene regulatory networks. Traditional computational approaches have predominantly focused on identifying miRNA targets through perfect Watson–Crick base pairings within the seed region, referred to as canonical sites. However, emerging evidence suggests that perfect seed matches are not a prerequisite for miRNA-mediated regulation, underscoring the importance of also recognizing imperfect, or non-canonical, sites. To address this challenge, we propose Mimosa, a new computational approach that employs the Transformer framework to enhance the prediction of miRNA targets. Mimosa distinguishes itself by integrating contextual, positional and base-pairing information to capture in-depth attributes, thereby improving its predictive capabilities. Its unique ability to identify non-canonical base-pairing patterns makes Mimosa a standout model, reducing the reliance on pre-selecting candidate targets. Mimosa achieves superior performance in gene-level predictions and also shows impressive performance in site-level predictions across various non-human species through extensive benchmarking tests. To facilitate research efforts in miRNA targeting, we have developed an easy-to-use web server for comprehensive end-to-end predictions, which is publicly available at http://monash.bioweb.cloud.edu.au/Mimosa.
{"title":"Advancing microRNA target site prediction with transformer and base-pairing patterns","authors":"Yue Bi, Fuyi Li, Cong Wang, Tong Pan, Chen Davidovich, Geoffrey I Webb, Jiangning Song","doi":"10.1093/nar/gkae782","DOIUrl":"https://doi.org/10.1093/nar/gkae782","url":null,"abstract":"MicroRNAs (miRNAs) are short non-coding RNAs involved in various cellular processes, playing a crucial role in gene regulation. Identifying miRNA targets remains a central challenge and is pivotal for elucidating the complex gene regulatory networks. Traditional computational approaches have predominantly focused on identifying miRNA targets through perfect Watson–Crick base pairings within the seed region, referred to as canonical sites. However, emerging evidence suggests that perfect seed matches are not a prerequisite for miRNA-mediated regulation, underscoring the importance of also recognizing imperfect, or non-canonical, sites. To address this challenge, we propose Mimosa, a new computational approach that employs the Transformer framework to enhance the prediction of miRNA targets. Mimosa distinguishes itself by integrating contextual, positional and base-pairing information to capture in-depth attributes, thereby improving its predictive capabilities. Its unique ability to identify non-canonical base-pairing patterns makes Mimosa a standout model, reducing the reliance on pre-selecting candidate targets. Mimosa achieves superior performance in gene-level predictions and also shows impressive performance in site-level predictions across various non-human species through extensive benchmarking tests. To facilitate research efforts in miRNA targeting, we have developed an easy-to-use web server for comprehensive end-to-end predictions, which is publicly available at http://monash.bioweb.cloud.edu.au/Mimosa.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142233311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cytidine base editors (CBEs) hold significant potential in genetic disease treatment and in breeding superior traits into animals. However, their large protein sizes limit their delivery by adeno-associated virus (AAV), given its packing capacity of <4.7 kb. To overcome this, we employed a web-based fast generic discovery (WFG) strategy, identifying several small ssDNA deaminases (Sdds) and constructing multiple Sdd-CBE 1.0 versions. SflSdd-CBE 1.0 demonstrated high C-to-T editing efficiency, comparable to AncBE4max, while SviSdd-CBE 1.0 exhibited moderate C-to-T editing efficiency with a narrow editing window (C3 to C5). Utilizing AlphaFold2, we devised a one-step miniaturization strategy, reducing the size of Sdds while preserving their efficiency. Notably, we administered AAV8 expressing PCSK9 targeted sgRNA and SflSdd-CBEs (nSaCas9) 2.0 into mice, leading to gene-editing events (with editing efficiency up to 15%) and reduced serum cholesterol levels, underscoring the potential of Sdds in gene therapy. These findings offer new single-stranded editing tools for the treatment of rare genetic diseases.
{"title":"Accelerated discovery and miniaturization of novel single-stranded cytidine deaminases","authors":"Jiacheng Deng, Xueyuan Li, Hao Yu, Lin Yang, Ziru Wang, Wenfeng Yi, Ying Liu, Wenyu Xiao, Hongyong Xiang, Zicong Xie, Dongmei Lv, Hongsheng Ouyang, Daxin Pang, Hongming Yuan","doi":"10.1093/nar/gkae800","DOIUrl":"https://doi.org/10.1093/nar/gkae800","url":null,"abstract":"Cytidine base editors (CBEs) hold significant potential in genetic disease treatment and in breeding superior traits into animals. However, their large protein sizes limit their delivery by adeno-associated virus (AAV), given its packing capacity of &lt;4.7 kb. To overcome this, we employed a web-based fast generic discovery (WFG) strategy, identifying several small ssDNA deaminases (Sdds) and constructing multiple Sdd-CBE 1.0 versions. SflSdd-CBE 1.0 demonstrated high C-to-T editing efficiency, comparable to AncBE4max, while SviSdd-CBE 1.0 exhibited moderate C-to-T editing efficiency with a narrow editing window (C3 to C5). Utilizing AlphaFold2, we devised a one-step miniaturization strategy, reducing the size of Sdds while preserving their efficiency. Notably, we administered AAV8 expressing PCSK9 targeted sgRNA and SflSdd-CBEs (nSaCas9) 2.0 into mice, leading to gene-editing events (with editing efficiency up to 15%) and reduced serum cholesterol levels, underscoring the potential of Sdds in gene therapy. These findings offer new single-stranded editing tools for the treatment of rare genetic diseases.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142233312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
What do we mean when we say ‘gene expression’? In the decades following Crick's 1958 central dogma of molecular biology, whereby genetic information flows from DNA (genes) to RNA (transcripts) to protein (products), we have learned a great deal about DNA, RNA, proteins, and the ensuing phenotypic changes. With the advent of high-throughput technologies (1990s), molecular biologists and computer scientists forged critical collaborations to understand the vast amount of data being generated, rapidly escalating gene expression research to the ‘omics’ level: entire sets of genes (genomes), transcribed RNAs (transcriptomes), and synthesized proteins (proteomes). However, some concessions came to be made for molecular biologists and computer scientists to understand each other—one of the most prevalent being the increasingly widespread use of ‘gene’ to mean ‘RNAs originating from a DNA segment’. This loosening of terminology, we will argue, creates ambiguity and confusion. We propose guidelines to increase precision and clarity when communicating about gene expression, most notably to reserve ‘gene’ for the DNA template and ‘transcript’ for the RNA transcribed from that gene. Striving to use perspicuous terminology will promote rigorous gene expression science and accelerate discovery in this highly promising area of biology.
{"title":"Striving for clarity in language about gene expression","authors":"Ana S G Cunningham, Myriam Gorospe","doi":"10.1093/nar/gkae764","DOIUrl":"https://doi.org/10.1093/nar/gkae764","url":null,"abstract":"What do we mean when we say ‘gene expression’? In the decades following Crick's 1958 central dogma of molecular biology, whereby genetic information flows from DNA (genes) to RNA (transcripts) to protein (products), we have learned a great deal about DNA, RNA, proteins, and the ensuing phenotypic changes. With the advent of high-throughput technologies (1990s), molecular biologists and computer scientists forged critical collaborations to understand the vast amount of data being generated, rapidly escalating gene expression research to the ‘omics’ level: entire sets of genes (genomes), transcribed RNAs (transcriptomes), and synthesized proteins (proteomes). However, some concessions came to be made for molecular biologists and computer scientists to understand each other—one of the most prevalent being the increasingly widespread use of ‘gene’ to mean ‘RNAs originating from a DNA segment’. This loosening of terminology, we will argue, creates ambiguity and confusion. We propose guidelines to increase precision and clarity when communicating about gene expression, most notably to reserve ‘gene’ for the DNA template and ‘transcript’ for the RNA transcribed from that gene. Striving to use perspicuous terminology will promote rigorous gene expression science and accelerate discovery in this highly promising area of biology.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":null,"pages":null},"PeriodicalIF":14.9,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142233313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}