Determining RNA secondary structure is a core problem in computational biology. Fast algorithms for predicting secondary structure are fundamental to this task.Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY, USA We describe a modified formulation of the Zuker-Stiegler algorithm with coaxial stacking, a stabilising interaction in which the ends of helices in multi-loops are stacked. In particular, optimal coaxial stacking is computed as part of the dynamic programming state, rather than in an inner loop. We introduce a new notion of sparsity, which we call replaceability. Replaceability is a more general condition and applicable in more places than the triangle inequality that is used by previous sparse folding methods. We also introduce non-monotonic candidate lists as an additional sparsification tool. Existing usages of the triangle inequality for sparsification can be thought of as an application of both replaceability and monotonicity together. The modified recurrences along with replaceability allows sparsification to be applied to coaxial stacking as well, which increases the speed of the algorithm. We implemented this algorithm in software we call memerna, which we show to have the fastest exact (non-heuristic) implementation of RNA folding under the complete Turner 2004 model with coaxial stacking, out of several popular RNA folding tools supporting coaxial stacking. We also introduce a new notation for secondary structure which includes coaxial stacking, terminal mismatches, and dangles (CTDs) information. The memerna package 0.1 release is available at https://github.com/Edgeworth/memerna/tree/release/0.1.
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
The accurate and efficient biogenesis of RNA by cellular RNA polymerase (RNAP) requires accessory factors that regulate the initiation, elongation, and termination of transcription. Of the many discovered to date, the elongation regulator NusG-Spt5 is the only universally conserved transcription factor. With orthologs and paralogs found in all three domains of life, this ubiquity underscores their ancient and essential regulatory functions. NusG-Spt5 proteins evolved to maintain a similar binding interface to RNAP through contacts of the NusG N-terminal domain (NGN) that bridge the main DNA-binding cleft. We propose that varying strength of these contacts, modulated by tethering interactions, either decrease transcriptional pausing by smoothing the rugged thermodynamic landscape of transcript elongation or enhance pausing, depending on which conformation of RNAP is stabilized by NGN contacts. NusG-Spt5 contains one (in bacteria and archaea) or more (in eukaryotes) C-terminal domains that use a KOW fold to contact diverse targets, tether the NGN, and control RNA biogenesis. Recent work highlights these diverse functions in different organisms. Some bacteria contain multiple specialized NusG paralogs that regulate subsets of operons via sequence-specific targeting, controlling production of antibiotics, toxins, or capsule proteins. Despite their common origin, NusG orthologs can differ in their target selection, interacting partners, and effects on RNA synthesis. We describe the current understanding of NusG-Spt5 structure, interactions with RNAP and other regulators, and cellular functions including significant recent progress from genome-wide analyses, single-molecule visualization, and cryo-EM. The recent findings highlight the remarkable diversity of function among these structurally conserved proteins.