We present the first chromosome-level genome assembly of the grasshopper, Locusta migratoria, one of the largest insect genomes. We use coverage differences between females (XX) and males (X0) to identify the X Chromosome gene content, and find that the X Chromosome shows both complete dosage compensation in somatic tissues and an underrepresentation of testis-expressed genes. X-linked gene content from L. migratoria is highly conserved across seven insect orders, namely Orthoptera, Odonata, Phasmatodea, Hemiptera, Neuroptera, Coleoptera, and Diptera, and the 800 Mb grasshopper X Chromosome is homologous to the fly ancestral X Chromosome despite 400 million years of divergence, suggesting either repeated origin of sex chromosomes with highly similar gene content, or long-term conservation of the X Chromosome. We use this broad conservation of the X Chromosome to test for temporal dynamics to Fast-X evolution, and find evidence of a recent burst evolution for new X-linked genes in contrast to slow evolution of X-conserved genes.
我们首次展示了蚱蜢(Locusta migratoria)染色体水平的基因组组装,这是最大的昆虫基因组之一。我们利用雌性(XX)和雄性(X0)之间的覆盖率差异来确定 X 染色体的基因含量,并发现 X 染色体在体细胞组织中表现出完全的剂量补偿以及睾丸表达基因的代表性不足。蚱蜢的 800 Mb X 染色体与蝇类祖先的 X 染色体同源,尽管二者已经存在 4 亿年的差异,这表明具有高度相似基因内容的性染色体是重复起源的,或者 X 染色体是长期保存的。我们利用 X 染色体的这种广泛保护来检验快速-X 进化的时间动态,并发现了新的 X 连锁基因近期爆发性进化的证据,这与 X 保守基因的缓慢进化形成了鲜明对比。
{"title":"The grasshopper genome reveals long-term gene content conservation of the X Chromosome and temporal variation in X Chromosome evolution.","authors":"Xinghua Li, Judith E Mank, Liping Ban","doi":"10.1101/gr.278794.123","DOIUrl":"10.1101/gr.278794.123","url":null,"abstract":"<p><p>We present the first chromosome-level genome assembly of the grasshopper, <i>Locusta migratoria</i>, one of the largest insect genomes. We use coverage differences between females (XX) and males (X0) to identify the X Chromosome gene content, and find that the X Chromosome shows both complete dosage compensation in somatic tissues and an underrepresentation of testis-expressed genes. X-linked gene content from <i>L. migratoria</i> is highly conserved across seven insect orders, namely Orthoptera, Odonata, Phasmatodea, Hemiptera, Neuroptera, Coleoptera, and Diptera, and the 800 Mb grasshopper X Chromosome is homologous to the fly ancestral X Chromosome despite 400 million years of divergence, suggesting either repeated origin of sex chromosomes with highly similar gene content, or long-term conservation of the X Chromosome. We use this broad conservation of the X Chromosome to test for temporal dynamics to Fast-X evolution, and find evidence of a recent burst evolution for new X-linked genes in contrast to slow evolution of X-conserved genes.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"997-1007"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Denyer, Pin-Jou Wu, Kelly Colt, Bradley W Abramson, Zhili Pang, Pavel Solansky, Allen Mamerto, Tatsuya Nobori, Joseph R Ecker, Eric Lam, Todd P Michael, Marja C P Timmermans
Single-cell genomics permits a new resolution in the examination of molecular and cellular dynamics, allowing global, parallel assessments of cell types and cellular behaviors through development and in response to environmental circumstances, such as interaction with water and the light-dark cycle of the Earth. Here, we leverage the smallest, and possibly most structurally reduced, plant, the semiaquatic Wolffia australiana, to understand dynamics of cell expression in these contexts at the whole-plant level. We examined single-cell-resolution RNA-sequencing data and found Wolffia cells divide into four principal clusters representing the above- and below-water-situated parenchyma and epidermis. Although these tissues share transcriptomic similarity with model plants, they display distinct adaptations that Wolffia has made for the aquatic environment. Within this broad classification, discrete subspecializations are evident, with select cells showing unique transcriptomic signatures associated with developmental maturation and specialized physiologies. Assessing this simplified biological system temporally at two key time-of-day (TOD) transitions, we identify additional TOD-responsive genes previously overlooked in whole-plant transcriptomic approaches and demonstrate that the core circadian clock machinery and its downstream responses can vary in cell-specific manners, even in this simplified system. Distinctions between cell types and their responses to submergence and/or TOD are driven by expression changes of unexpectedly few genes, characterizing Wolffia as a highly streamlined organism with the majority of genes dedicated to fundamental cellular processes. Wolffia provides a unique opportunity to apply reductionist biology to elucidate signaling functions at the organismal level, for which this work provides a powerful resource.
单细胞基因组学为分子和细胞动态研究提供了新的分辨率,可以对细胞类型和细胞在发育过程中的行为以及对环境条件(如与水的相互作用和地球的光-暗循环)的反应进行全面、平行的评估。在这里,我们利用半水生的澳大利亚狼尾草(Wolffia australiana)这种最小、也可能是结构最简单的植物,来了解这些情况下细胞在整株植物水平上的表达动态。我们研究了单细胞分辨率的 RNA 测序数据,发现灰灰菜细胞分为四个主要群组,分别代表水上和水下的实质和表皮。虽然这些组织与模式植物的转录组相似,但它们显示了狼尾草对水生环境的独特适应性。在这一广泛的分类中,离散的亚专业化非常明显,部分细胞显示出与发育成熟和专业生理相关的独特转录组特征。通过在两个关键的日时(TOD)转换阶段对这一简化的生物系统进行时间评估,我们发现了更多以前在全植物转录组学方法中被忽视的 TOD 响应基因,并证明即使在这一简化系统中,核心昼夜节律时钟机制及其下游响应也会以细胞特异性的方式发生变化。细胞类型之间的差异及其对浸没和/或 TOD 的反应是由出乎意料的少数基因的表达变化驱动的,这说明狼尾草是一种高度精简的生物体,其大部分基因都用于基本的细胞过程。狼尾草为应用还原生物学阐明生物体水平的信号功能提供了一个独特的机会,这项工作为此提供了强大的资源。
{"title":"Streamlined spatial and environmental expression signatures characterize the minimalist duckweed <i>Wolffia australiana</i>.","authors":"Tom Denyer, Pin-Jou Wu, Kelly Colt, Bradley W Abramson, Zhili Pang, Pavel Solansky, Allen Mamerto, Tatsuya Nobori, Joseph R Ecker, Eric Lam, Todd P Michael, Marja C P Timmermans","doi":"10.1101/gr.279091.124","DOIUrl":"10.1101/gr.279091.124","url":null,"abstract":"<p><p>Single-cell genomics permits a new resolution in the examination of molecular and cellular dynamics, allowing global, parallel assessments of cell types and cellular behaviors through development and in response to environmental circumstances, such as interaction with water and the light-dark cycle of the Earth. Here, we leverage the smallest, and possibly most structurally reduced, plant, the semiaquatic <i>Wolffia australiana</i>, to understand dynamics of cell expression in these contexts at the whole-plant level. We examined single-cell-resolution RNA-sequencing data and found <i>Wolffia</i> cells divide into four principal clusters representing the above- and below-water-situated parenchyma and epidermis. Although these tissues share transcriptomic similarity with model plants, they display distinct adaptations that <i>Wolffia</i> has made for the aquatic environment. Within this broad classification, discrete subspecializations are evident, with select cells showing unique transcriptomic signatures associated with developmental maturation and specialized physiologies. Assessing this simplified biological system temporally at two key time-of-day (TOD) transitions, we identify additional TOD-responsive genes previously overlooked in whole-plant transcriptomic approaches and demonstrate that the core circadian clock machinery and its downstream responses can vary in cell-specific manners, even in this simplified system. Distinctions between cell types and their responses to submergence and/or TOD are driven by expression changes of unexpectedly few genes, characterizing <i>Wolffia</i> as a highly streamlined organism with the majority of genes dedicated to fundamental cellular processes. <i>Wolffia</i> provides a unique opportunity to apply reductionist biology to elucidate signaling functions at the organismal level, for which this work provides a powerful resource.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1106-1120"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368201/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sizhen Li, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi, Lorenzo Kogler-Anele, Milad Miladi, Jacob Miner, Fabien Pertuy, Dinghai Zheng, Jun Wang, Akshay Balsubramani, Khang Tran, Minnie Zacharia, Monica Wu, Xiaobo Gu, Ryan Clinton, Carla Asquith, Joseph Skaleski, Lianne Boeglin, Sudha Chivukula, Anusha Dias, Tod Strugnell, Fernando Ulloa Montoya, Vikram Agarwal, Ziv Bar-Joseph, Sven Jager
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.
{"title":"CodonBERT large language model for mRNA vaccines.","authors":"Sizhen Li, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi, Lorenzo Kogler-Anele, Milad Miladi, Jacob Miner, Fabien Pertuy, Dinghai Zheng, Jun Wang, Akshay Balsubramani, Khang Tran, Minnie Zacharia, Monica Wu, Xiaobo Gu, Ryan Clinton, Carla Asquith, Joseph Skaleski, Lianne Boeglin, Sudha Chivukula, Anusha Dias, Tod Strugnell, Fernando Ulloa Montoya, Vikram Agarwal, Ziv Bar-Joseph, Sven Jager","doi":"10.1101/gr.278870.123","DOIUrl":"10.1101/gr.278870.123","url":null,"abstract":"<p><p>mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1027-1035"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caitlin A Loh, Danielle A Shields, Adam Schwing, Gilad D Evrony
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
{"title":"High-fidelity, large-scale targeted profiling of microsatellites.","authors":"Caitlin A Loh, Danielle A Shields, Adam Schwing, Gilad D Evrony","doi":"10.1101/gr.278785.123","DOIUrl":"10.1101/gr.278785.123","url":null,"abstract":"<p><p>Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant \"stutter\" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-\"stutter\" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1008-1026"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141626499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanjue Xiang, Xi He, Belinda M Giardine, Kathryn J Isaac, Dylan J Taylor, Rajiv C McCoy, Camden Jansen, Cheryl A Keller, Alexander Q Wixom, April Cockburn, Amber Miller, Qian Qi, Yanghua He, Yichao Li, Jens Lichtenberg, Elisabeth F Heuston, Stacie M Anderson, Jing Luan, Marit W Vermunt, Feng Yue, Michael E G Sauria, Michael C Schatz, James Taylor, Berthold Göttgens, Jim R Hughes, Douglas R Higgs, Mitchell J Weiss, Yong Cheng, Gerd A Blobel, David M Bodine, Yu Zhang, Qunhua Li, Shaun Mahony, Ross C Hardison
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
{"title":"Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes.","authors":"Guanjue Xiang, Xi He, Belinda M Giardine, Kathryn J Isaac, Dylan J Taylor, Rajiv C McCoy, Camden Jansen, Cheryl A Keller, Alexander Q Wixom, April Cockburn, Amber Miller, Qian Qi, Yanghua He, Yichao Li, Jens Lichtenberg, Elisabeth F Heuston, Stacie M Anderson, Jing Luan, Marit W Vermunt, Feng Yue, Michael E G Sauria, Michael C Schatz, James Taylor, Berthold Göttgens, Jim R Hughes, Douglas R Higgs, Mitchell J Weiss, Yong Cheng, Gerd A Blobel, David M Bodine, Yu Zhang, Qunhua Li, Shaun Mahony, Ross C Hardison","doi":"10.1101/gr.277950.123","DOIUrl":"10.1101/gr.277950.123","url":null,"abstract":"<p><p>Knowledge of locations and activities of <i>cis</i>-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1089-1105"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juri Kuronen, Samuel T Horsfield, Anna K Pöntinen, Sudaraka Mallawaarachchi, Sergio Arredondo-Alonso, Harry Thorpe, Rebecca A Gladstone, Rob J L Willems, Stephen D Bentley, Nicholas J Croucher, Johan Pensar, John A Lees, Gerry Tonkin-Hill, Jukka Corander
Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens Streptococcus pneumoniae and Enterococcus faecalis.
{"title":"Pangenome-spanning epistasis and coselection analysis via de Bruijn graphs.","authors":"Juri Kuronen, Samuel T Horsfield, Anna K Pöntinen, Sudaraka Mallawaarachchi, Sergio Arredondo-Alonso, Harry Thorpe, Rebecca A Gladstone, Rob J L Willems, Stephen D Bentley, Nicholas J Croucher, Johan Pensar, John A Lees, Gerry Tonkin-Hill, Jukka Corander","doi":"10.1101/gr.278485.123","DOIUrl":"10.1101/gr.278485.123","url":null,"abstract":"<p><p>Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens <i>Streptococcus pneumoniae</i> and <i>Enterococcus faecalis</i>.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1081-1088"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory relationship remains unaffected by the aforementioned factors, highlighting the extensive gene interactions within organisms. Therefore, we propose scHGR, an automated annotation tool designed to leverage gene regulatory relationships in constructing gene-mediated cell communication graphs for single-cell transcriptome data. This strategy helps reduce noise from diverse data sources while establishing distant cellular connections, yielding valuable biological insights. Experiments involving 22 scenarios demonstrate that scHGR precisely and consistently annotates cell identities, benchmarked against state-of-the-art methods. Crucially, scHGR uncovers novel subtypes within peripheral blood mononuclear cells, specifically from CD4+ T cells and cytotoxic T cells. Furthermore, by characterizing a cell atlas comprising 56 cell types for COVID-19 patients, scHGR identifies vital factors like IL1 and calcium ions, offering insights for targeted therapeutic interventions.
单细胞转录组数据的细胞身份注释是构建细胞图谱、揭示发病机制和启发治疗方法的关键过程。目前,现有方法的有效性取决于特定的数据集。然而,这些数据往往来自不同的批次、测序技术、组织甚至物种。值得注意的是,基因调控关系仍然不受上述因素的影响,这凸显了生物体内广泛的基因相互作用。因此,我们提出了 scHGR,这是一种自动注释工具,旨在利用基因调控关系为单细胞转录组数据构建基因介导的细胞通讯图谱。这种策略有助于减少来自不同数据源的噪声,同时建立遥远的细胞联系,从而获得有价值的生物学见解。涉及 22 种情况的实验表明,与最先进的方法相比,scHGR 能精确、一致地注释细胞身份。最重要的是,scHGR 发现了外周血单核细胞中的新型亚型,特别是 CD4+ T 细胞和细胞毒性 T 细胞。此外,通过对 COVID-19 患者的 56 种细胞类型组成的细胞图谱进行特征描述,scHGR 确定了 IL1 和钙离子等重要因子,为有针对性的治疗干预提供了启示。
{"title":"A gene regulatory network-aware graph learning method for cell identity annotation in single-cell RNA-seq data.","authors":"Mengyuan Zhao, Jiawei Li, Xiaoyi Liu, Ke Ma, Jijun Tang, Fei Guo","doi":"10.1101/gr.278439.123","DOIUrl":"10.1101/gr.278439.123","url":null,"abstract":"<p><p>Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory relationship remains unaffected by the aforementioned factors, highlighting the extensive gene interactions within organisms. Therefore, we propose scHGR, an automated annotation tool designed to leverage gene regulatory relationships in constructing gene-mediated cell communication graphs for single-cell transcriptome data. This strategy helps reduce noise from diverse data sources while establishing distant cellular connections, yielding valuable biological insights. Experiments involving 22 scenarios demonstrate that scHGR precisely and consistently annotates cell identities, benchmarked against state-of-the-art methods. Crucially, scHGR uncovers novel subtypes within peripheral blood mononuclear cells, specifically from CD4<sup>+</sup> T cells and cytotoxic T cells. Furthermore, by characterizing a cell atlas comprising 56 cell types for COVID-19 patients, scHGR identifies vital factors like IL1 and calcium ions, offering insights for targeted therapeutic interventions.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1036-1051"},"PeriodicalIF":6.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanina Timasheva, Kaido Lepik, Orsolya Liska, Balázs Papp, Zoltan Kutalik
Natural selection acts ubiquitously on complex human traits, predominantly constraining the occurrence of extreme phenotypes (stabilizing selection). These constraints propagate to DNA sequence variants associated with traits under selection. The genetic imprints of such evolutionary events can thus be detected via combining effect size estimates from genetic association studies and the corresponding allele frequencies. While this approach has been successfully applied to high-level traits, the prevalence and mode of selection acting on molecular traits remains poorly understood. Here, we estimate the action of natural selection on genetic variants associated with metabolite levels, an important layer of molecular traits. By leveraging summary statistics of published genome-wide association studies with large sample sizes, we find strong evidence of stabilizing selection for 15 out of 97 plasma metabolites. Mendelian randomization analysis revealed that metabolites under stronger stabilizing selection display larger effects on a range of clinically relevant complex traits, suggesting that maintaining a disease-free profile may be an important source of selective constraints on the metabolome. Metabolites under strong stabilizing selection in humans are also more conserved in their concentrations among diverse mammalian species, suggesting shared selective forces across micro and macroevolutionary time scales. Finally, we also found evidence for both disruptive and directional selection on specific lipid metabolites, potentially indicating ongoing evolutionary adaptation in humans. Overall, this study demonstrates that variation in metabolite levels among humans is frequently shaped by natural selection and this may act through their causal impact on disease susceptibility.
自然选择对人类复杂性状的作用无处不在,主要是限制极端表型的出现(稳定选择)。这些制约因素会传播到与被选择性状相关的 DNA 序列变异中。因此,通过结合遗传关联研究的效应大小估计值和相应的等位基因频率,可以检测到此类进化事件的遗传印记。虽然这种方法已成功应用于高级性状,但人们对分子性状选择的普遍性和模式仍然知之甚少。在这里,我们估算了自然选择对与代谢物水平相关的遗传变异的作用,代谢物水平是分子性状的一个重要层面。通过利用已发表的大样本量全基因组关联研究的汇总统计,我们发现在 97 种血浆代谢物中,有 15 种存在稳定选择的有力证据。孟德尔随机化分析表明,处于较强稳定选择下的代谢物对一系列临床相关的复杂性状具有较大的影响,这表明保持无病特征可能是代谢组选择性限制的一个重要来源。在人类中处于强稳定选择下的代谢物在不同哺乳动物物种中的浓度也更加一致,这表明在微观和宏观进化时间尺度上存在共同的选择性力量。最后,我们还发现了对特定脂质代谢物进行破坏性选择和定向选择的证据,这可能表明人类正在进行进化适应。总之,这项研究表明,人类代谢物水平的变化经常受到自然选择的影响,这可能通过它们对疾病易感性的因果影响发挥作用。
{"title":"Widespread natural selection on metabolite levels in humans","authors":"Yanina Timasheva, Kaido Lepik, Orsolya Liska, Balázs Papp, Zoltan Kutalik","doi":"10.1101/gr.278756.123","DOIUrl":"https://doi.org/10.1101/gr.278756.123","url":null,"abstract":"Natural selection acts ubiquitously on complex human traits, predominantly constraining the occurrence of extreme phenotypes (stabilizing selection). These constraints propagate to DNA sequence variants associated with traits under selection. The genetic imprints of such evolutionary events can thus be detected via combining effect size estimates from genetic association studies and the corresponding allele frequencies. While this approach has been successfully applied to high-level traits, the prevalence and mode of selection acting on molecular traits remains poorly understood. Here, we estimate the action of natural selection on genetic variants associated with metabolite levels, an important layer of molecular traits. By leveraging summary statistics of published genome-wide association studies with large sample sizes, we find strong evidence of stabilizing selection for 15 out of 97 plasma metabolites. Mendelian randomization analysis revealed that metabolites under stronger stabilizing selection display larger effects on a range of clinically relevant complex traits, suggesting that maintaining a disease-free profile may be an important source of selective constraints on the metabolome. Metabolites under strong stabilizing selection in humans are also more conserved in their concentrations among diverse mammalian species, suggesting shared selective forces across micro and macroevolutionary time scales. Finally, we also found evidence for both disruptive and directional selection on specific lipid metabolites, potentially indicating ongoing evolutionary adaptation in humans. Overall, this study demonstrates that variation in metabolite levels among humans is frequently shaped by natural selection and this may act through their causal impact on disease susceptibility.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"3 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily Lowry, Yiqing Wang, Tal Dagan, Amir Mitchell
Colibactin produced primarily by Escherichia coli strains of the B2 phylogroup crosslinks DNA and can promote colon cancer in human hosts. We investigated the toxin's impact on colibactin producers and on bacteria co-cultured with producing cells. Using genome-wide genetic screens and mutation accumulation experiments we uncovered the cellular pathways that mitigate colibactin damage and revealed the specific mutations it induces. We discovered that while colibactin targets A/T rich motifs, as observed in human colon cells, it induces a bacteria-unique mutation pattern. Based on this pattern, we predicted that long-term colibactin exposure will culminate in a genomic bias in trinucleotide composition. We tested this prediction by analyzing thousands of E. coli genomes and found that colibactin-producing strains indeed show the predicted skewness in trinucleotide composition. Our work revealed a bacteria-specific mutation pattern and suggests that the resistance protein encoded on the colibactin pathogenicity island is insufficient in preventing self-inflicted DNA damage.
大肠杆菌毒素主要由 B2 系统群的大肠杆菌菌株产生,可交联 DNA 并诱发人类宿主的结肠癌。我们研究了这种毒素对大肠杆菌生产者以及与生产者细胞共培养的细菌的影响。通过全基因组遗传筛选和突变累积实验,我们发现了减轻大肠杆菌毒素损伤的细胞通路,并揭示了其诱导的特定突变。我们发现,正如在人类结肠细胞中观察到的那样,虽然可乐菌素以富含 A/T 的基序为目标,但它会诱导一种细菌特有的突变模式。根据这种模式,我们预测长期暴露于 colibactin 将最终导致三核苷酸组成的基因组偏差。我们通过分析数以千计的大肠杆菌基因组验证了这一预测,发现产生可乐菌素的菌株确实在三核苷酸组成方面表现出预测的偏斜。我们的工作揭示了一种细菌特异性突变模式,并表明在可乐菌素致病性岛上编码的抗性蛋白不足以防止自身造成的 DNA 损伤。
{"title":"Colibactin leads to a bacteria-specific mutation pattern and self-inflicted DNA damage","authors":"Emily Lowry, Yiqing Wang, Tal Dagan, Amir Mitchell","doi":"10.1101/gr.279517.124","DOIUrl":"https://doi.org/10.1101/gr.279517.124","url":null,"abstract":"Colibactin produced primarily by <em>Escherichia coli</em> strains of the B2 phylogroup crosslinks DNA and can promote colon cancer in human hosts. We investigated the toxin's impact on colibactin producers and on bacteria co-cultured with producing cells. Using genome-wide genetic screens and mutation accumulation experiments we uncovered the cellular pathways that mitigate colibactin damage and revealed the specific mutations it induces. We discovered that while colibactin targets A/T rich motifs, as observed in human colon cells, it induces a bacteria-unique mutation pattern. Based on this pattern, we predicted that long-term colibactin exposure will culminate in a genomic bias in trinucleotide composition. We tested this prediction by analyzing thousands of <em>E. coli</em> genomes and found that colibactin-producing strains indeed show the predicted skewness in trinucleotide composition. Our work revealed a bacteria-specific mutation pattern and suggests that the resistance protein encoded on the colibactin pathogenicity island is insufficient in preventing self-inflicted DNA damage.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"96 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashlyn G Anderson, Belle A Moyers, Jacob M Loupe, Ivan Rodriguez-Nunez, Stephanie A Felker, James M.J. Lawlor, William E Bunney, Blynn G Bunney, Preston M Cartagena, Adolfo Sequeira, Stanley Watson, Huda Akil, Eric M Mendenhall, Gregory M Cooper, Richard M. Myers
Transcription Factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Since TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF-DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele specific binding (ASB) at heterozygous variants for 94 TFs in 9 brain regions from two donors. Leveraging graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signals between alleles at heterozygous variants within each brain region and identified thousands of variants exhibiting ASB for at least one TF. ASB reproducibility was measured by comparisons between independent experiments both within and between donors. We found that rarer alleles in the general population more frequently led to reduced TF binding, whereas common variation had an equal likelihood of increasing or decreasing binding. Motif analysis revealed TF-specific effects, with ASB variants for certain TFs displaying a greater incidence of motif alterations, as well as enrichments for variants under purifying selection. Notably, neuron-specific cis-regulatory elements (cCREs) showed depletion for ASB variants. We identified 2,670 ASB variants with prior evidence of allele-specific gene expression in the brain from GTEx data and observed increasing eQTL effect direction concordance as ASB significance increases. These results provide a valuable and unique resource for mechanistic analysis of cis-regulatory variation in human brain tissue.
{"title":"Allele specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs","authors":"Ashlyn G Anderson, Belle A Moyers, Jacob M Loupe, Ivan Rodriguez-Nunez, Stephanie A Felker, James M.J. Lawlor, William E Bunney, Blynn G Bunney, Preston M Cartagena, Adolfo Sequeira, Stanley Watson, Huda Akil, Eric M Mendenhall, Gregory M Cooper, Richard M. Myers","doi":"10.1101/gr.278601.123","DOIUrl":"https://doi.org/10.1101/gr.278601.123","url":null,"abstract":"Transcription Factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Since TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF-DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele specific binding (ASB) at heterozygous variants for 94 TFs in 9 brain regions from two donors. Leveraging graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signals between alleles at heterozygous variants within each brain region and identified thousands of variants exhibiting ASB for at least one TF. ASB reproducibility was measured by comparisons between independent experiments both within and between donors. We found that rarer alleles in the general population more frequently led to reduced TF binding, whereas common variation had an equal likelihood of increasing or decreasing binding. Motif analysis revealed TF-specific effects, with ASB variants for certain TFs displaying a greater incidence of motif alterations, as well as enrichments for variants under purifying selection. Notably, neuron-specific <em>cis</em>-regulatory elements (cCREs) showed depletion for ASB variants. We identified 2,670 ASB variants with prior evidence of allele-specific gene expression in the brain from GTEx data and observed increasing eQTL effect direction concordance as ASB significance increases. These results provide a valuable and unique resource for mechanistic analysis of <em>cis</em>-regulatory variation in human brain tissue.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"38 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141994473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}