Michelle Kudron, Louis Gewirtzman, Alec Victorsen, Bridget C Lear, Dionne Vafeados, Jiahao Gao, Jinrui Xu, Swapna Samanta, Emily Frink, Adri Tran-Pearson, Chau Hyunh, Ann Hammonds, William Fisher, Martha L Wall, Greg Wesseling, Vanessa Hernandez, Zhichun Lin, Mary Kasparian, Kevin P White, Ravi Allada, Mark Gerstein, LaDeana Hillier, Susan E Celniker, Valerie Reinke, Robert Waterston
{"title":"Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships","authors":"Michelle Kudron, Louis Gewirtzman, Alec Victorsen, Bridget C Lear, Dionne Vafeados, Jiahao Gao, Jinrui Xu, Swapna Samanta, Emily Frink, Adri Tran-Pearson, Chau Hyunh, Ann Hammonds, William Fisher, Martha L Wall, Greg Wesseling, Vanessa Hernandez, Zhichun Lin, Mary Kasparian, Kevin P White, Ravi Allada, Mark Gerstein, LaDeana Hillier, Susan E Celniker, Valerie Reinke, Robert Waterston","doi":"10.1101/gr.279037.124","DOIUrl":null,"url":null,"abstract":"A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, <em>Drosophila melanogaster</em> (fly) and <em>Caenorhabditis elegans</em> (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed \"metapeaks\", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279037.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.