The DNA damage response (DDR) ensures error-free DNA replication and transcription and is disrupted in numerous diseases. An ongoing challenge is to determine the proteins orchestrating DDR and their organization into complexes, including constitutive interactions and those responding to genomic insult. Here, we use multi-conditional network analysis to systematically map DDR assemblies at multiple scales. Affinity purifications of 21 DDR proteins, with/without genotoxin exposure, are combined with multi-omics data to reveal a hierarchical organization of 605 proteins into 109 assemblies. The map captures canonical repair mechanisms and proposes new DDR-associated proteins extending to stress, transport, and chromatin functions. We find that protein assemblies closely align with genetic dependencies in processing specific genotoxins and that proteins in multiple assemblies typically act in multiple genotoxin responses. Follow-up by DDR functional readouts newly implicates 12 assembly members in double-strand-break repair. The DNA damage response assemblies map is available for interactive visualization and query (ccmi.org/ddram/).
Single-cell RNA sequencing (scRNA-seq) is a powerful technique for describing cell states. Identifying the spatial arrangement of these states in tissues remains challenging, with the existing methods requiring niche methodologies and expertise. Here, we describe segmentation by exogenous perfusion (SEEP), a rapid and integrated method to link surface proximity and environment accessibility to transcriptional identity within three-dimensional (3D) disease models. The method utilizes the steady-state diffusion kinetics of a fluorescent dye to establish a gradient along the radial axis of disease models. Classification of sample layers based on dye accessibility enables dissociated and sorted cells to be characterized by transcriptomic and regional identities. Using SEEP, we analyze spheroid, organoid, and in vivo tumor models of high-grade serous ovarian cancer (HGSOC). The results validate long-standing beliefs about the relationship between cell state and position while revealing new concepts regarding how spatially unique microenvironments influence the identity of individual cells within tumors.
A new method developed by Francisco Quintana's group, systematic perturbation of encapsulated associated cells followed by sequencing (SPEAC-seq), applies a CRISPR screen to co-cultured interacting cells to identify the ligands mediating cell-cell communication. Using this approach, the authors discover the molecular basis of a microglia-astrocyte feedback loop that suppresses neuroinflammatory disease.
The design choices underlying machine-learning (ML) models present important barriers to entry for many biologists who aim to incorporate ML in their research. Automated machine-learning (AutoML) algorithms can address many challenges that come with applying ML to the life sciences. However, these algorithms are rarely used in systems and synthetic biology studies because they typically do not explicitly handle biological sequences (e.g., nucleotide, amino acid, or glycan sequences) and cannot be easily compared with other AutoML algorithms. Here, we present BioAutoMATED, an AutoML platform for biological sequence analysis that integrates multiple AutoML methods into a unified framework. Users are automatically provided with relevant techniques for analyzing, interpreting, and designing biological sequences. BioAutoMATED predicts gene regulation, peptide-drug interactions, and glycan annotation, and designs optimized synthetic biology components, revealing salient sequence characteristics. By automating sequence modeling, BioAutoMATED allows life scientists to incorporate ML more readily into their work.
To build therapeutic strains, Escherichia coli Nissle (EcN) have been engineered to express antibiotics, toxin-degrading enzymes, immunoregulators, and anti-cancer chemotherapies. For efficacy, the recombinant genes need to be highly expressed, but this imposes a burden on the cell, and plasmids are difficult to maintain in the body. To address these problems, we have developed landing pads in the EcN genome and genetic circuits to control therapeutic gene expression. These tools were applied to EcN SYNB1618, undergoing clinical trials as a phenylketonuria treatment. The pathway for converting phenylalanine to trans-cinnamic acid was moved to a landing pad under the control of a circuit that keeps the pathway off during storage. The resulting strain (EcN SYN8784) achieved higher activity than EcN SYNB1618, reaching levels near when the pathway is carried on a plasmid. This work demonstrates a simple system for engineering EcN that aids quantitative strain design for therapeutics.
The transcriptional effector domains of transcription factors play a key role in controlling gene expression; however, their functional nature is poorly understood, hampering our ability to explore this fundamental dimension of gene regulatory networks. To map the trans-regulatory landscape in a complex eukaryote, we systematically characterized the putative transcriptional effector domains of over 400 Arabidopsis thaliana transcription factors for their capacity to modulate transcription. We demonstrate that transcriptional effector activity can be integrated into gene regulatory networks capable of elucidating the functional dynamics underlying gene expression patterns. We further show how our characterized domains can enhance genome engineering efforts and reveal how plant transcriptional activators share regulatory features conserved across distantly related eukaryotes. Our results provide a framework to systematically characterize the regulatory role of transcription factors at a genome-scale in order to understand the transcriptional wiring of biological systems.
Many biological circuits comprise sets of protein variants that interact with one another in a many-to-many, or promiscuous, fashion. These architectures can provide powerful computational capabilities that are especially critical in multicellular organisms. Understanding the principles of biochemical computations in these circuits could allow more precise control of cellular behaviors. However, these systems are inherently difficult to analyze, due to their large number of interacting molecular components, partial redundancies, and cell context dependence. Here, we discuss recent experimental and theoretical advances that are beginning to reveal how promiscuous circuits compute, what roles those computations play in natural biological contexts, and how promiscuous architectures can be applied for the design of synthetic multicellular behaviors.
Viruses encode transcriptional regulatory proteins critical for controlling viral and host gene expression. Given their multifunctional nature and high sequence divergence, it is unclear which viral proteins can affect transcription and which specific sequences contribute to this function. Using a high-throughput assay, we measured the transcriptional regulatory potential of over 60,000 protein tiles across ∼1,500 proteins from 11 coronaviruses and all nine human herpesviruses. We discovered hundreds of transcriptional effector domains, including a conserved repression domain in all coronavirus Spike homologs, dual activation-repression domains in viral interferon regulatory factors (VIRFs), and an activation domain in six herpesvirus homologs of the single-stranded DNA-binding protein that we show is important for viral replication and late gene expression in Kaposi's sarcoma-associated herpesvirus (KSHV). For the effector domains we identified, we investigated their mechanisms via high-throughput sequence and chemical perturbations, pinpointing sequence motifs essential for function. This work massively expands viral protein annotations, serving as a springboard for studying their biological and health implications and providing new candidates for compact gene regulation tools.