Pub Date : 2024-08-09DOI: 10.1101/2024.08.08.607267
Luozixian Wang, Daniel Urrutia-Cabrera, Sandy Shen-Chi Hung, Alex W Hewitt, Samuel W Lukowski, Careen Foord, Peng-Yuan Wang, Hagen Tilgner, Raymond Wong
Recent single cell transcriptomic profiling of the human retina provided important insights into the genetic signals in heterogeneous retinal cell populations that enable vision. However, conventional single cell RNAseq with 3' short-read sequencing is not suitable to identify isoform variants. Here we utilized Iso-Seq with full-length sequencing to profile the human retina at single cell resolution for isoform discovery. We generated a retina transcriptome dataset consisting of 25,302 nuclei from three donor retina, and detected 49,710 known transcripts and 241,949 novel transcripts across major retinal cell types. We surveyed the use of alternative promoters to drive transcript variant expression, and showed that 1-8% of genes utilized multiple promoters across major retinal cell types. Also, our results enabled gene expression profiling of novel transcript variants for inherited retinal disease (IRD) genes, and identified differential usage of exon splicing in major retinal cell types. Altogether, we generated a human retina transcriptome dataset at single cell resolution with full-length sequencing. Our study highlighted the potential of Iso-Seq to map the isoform diversity in the human retina, providing an expanded view of the complex transcriptomic landscape in the retina.
{"title":"Iso-Seq enables discovery of novel isoform variants in human retina at single cell resolution","authors":"Luozixian Wang, Daniel Urrutia-Cabrera, Sandy Shen-Chi Hung, Alex W Hewitt, Samuel W Lukowski, Careen Foord, Peng-Yuan Wang, Hagen Tilgner, Raymond Wong","doi":"10.1101/2024.08.08.607267","DOIUrl":"https://doi.org/10.1101/2024.08.08.607267","url":null,"abstract":"Recent single cell transcriptomic profiling of the human retina provided important insights into the genetic signals in heterogeneous retinal cell populations that enable vision. However, conventional single cell RNAseq with 3' short-read sequencing is not suitable to identify isoform variants. Here we utilized Iso-Seq with full-length sequencing to profile the human retina at single cell resolution for isoform discovery. We generated a retina transcriptome dataset consisting of 25,302 nuclei from three donor retina, and detected 49,710 known transcripts and 241,949 novel transcripts across major retinal cell types. We surveyed the use of alternative promoters to drive transcript variant expression, and showed that 1-8% of genes utilized multiple promoters across major retinal cell types. Also, our results enabled gene expression profiling of novel transcript variants for inherited retinal disease (IRD) genes, and identified differential usage of exon splicing in major retinal cell types. Altogether, we generated a human retina transcriptome dataset at single cell resolution with full-length sequencing. Our study highlighted the potential of Iso-Seq to map the isoform diversity in the human retina, providing an expanded view of the complex transcriptomic landscape in the retina.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"370 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1101/2024.08.07.606392
Connor C Littlefield, Jose M Lazaro-Guevara, Devorah Stucki, Michael Lansford, Melissa H Pezzolesi, Emma J Taylor, Etoni Ma'asi C Wolfgramm, Jacob Taloa, Kime Lao, C Dave Dumaguit, Perry G Ridge, Justina P Tavana, William L Holland, Kalani L Raphael, Marcus G. Pezzolesi
Individuals of Pacific ancestry suffer some of the highest rates of health disparities yet remain vastly underrepresented in genomic research, including currently available linear and pangenome references. To begin addressing this, we developed the first Pacific ancestry pangenome reference using 23 individuals with diverse Pacific ancestry. We assembled 46 haploid genomes from these 23 individuals, resulting in highly accurate and contiguous genome assemblies with an average quality value of 55.0 and an average N50 of 40.7 Mb, marking the first de novo assembly of highly accurate Pacific ancestry genomes. We combined these assemblies to create a pangenome reference, which added 30.6 Mb of novel sequence missing from the Human Pangenome Reference Consortium (HPRC) reference. Mapping short-reads to this pangenome reduced variant call errors and yielded more true-positive variants compared to the HPRC and T2T-CHM13 references. This Pacific ancestry pangenome reference serves as a resource to enhance genetic analyses for this underserved population.
{"title":"A Draft Pacific Ancestry Pangenome Reference","authors":"Connor C Littlefield, Jose M Lazaro-Guevara, Devorah Stucki, Michael Lansford, Melissa H Pezzolesi, Emma J Taylor, Etoni Ma'asi C Wolfgramm, Jacob Taloa, Kime Lao, C Dave Dumaguit, Perry G Ridge, Justina P Tavana, William L Holland, Kalani L Raphael, Marcus G. Pezzolesi","doi":"10.1101/2024.08.07.606392","DOIUrl":"https://doi.org/10.1101/2024.08.07.606392","url":null,"abstract":"Individuals of Pacific ancestry suffer some of the highest rates of health disparities yet remain vastly underrepresented in genomic research, including currently available linear and pangenome references. To begin addressing this, we developed the first Pacific ancestry pangenome reference using 23 individuals with diverse Pacific ancestry. We assembled 46 haploid genomes from these 23 individuals, resulting in highly accurate and contiguous genome assemblies with an average quality value of 55.0 and an average N50 of 40.7 Mb, marking the first de novo assembly of highly accurate Pacific ancestry genomes. We combined these assemblies to create a pangenome reference, which added 30.6 Mb of novel sequence missing from the Human Pangenome Reference Consortium (HPRC) reference. Mapping short-reads to this pangenome reduced variant call errors and yielded more true-positive variants compared to the HPRC and T2T-CHM13 references. This Pacific ancestry pangenome reference serves as a resource to enhance genetic analyses for this underserved population.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1101/2024.08.08.607201
Nikol Chantzi, Ilias Georgakopoulos-Soares
Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases. However, their prevalence across taxa remains poorly characterized. Here we examined the impact of STRs in the genomes of 117,253 organisms spanning the tree of life. We find that there are large differences in the frequencies of STRs between organismal genomes and these differences are largely driven by the taxonomic group an organism belongs to. Using simulated genomes, we find that on average there is no enrichment of STRs in bacterial and archaeal genomes, suggesting that these genomes are not particularly repetitive. In contrast, we find that eukaryotic genomes are orders of magnitude more repetitive than expected. STRs are preferentially located at functional loci at specific taxa. Finally, we utilize the recently completed Telomere-to-Telomere genomes of human and other great apes, and find that STRs are highly abundant and variable between primate species, particularly in peri/centromeric regions. We conclude that STRs have expanded in eukaryotic and viral lineages and not in archaea or bacteria, resulting in large discrepancies in genomic composition.
{"title":"The repertoire of short tandem repeats across the tree of life","authors":"Nikol Chantzi, Ilias Georgakopoulos-Soares","doi":"10.1101/2024.08.08.607201","DOIUrl":"https://doi.org/10.1101/2024.08.08.607201","url":null,"abstract":"Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases. However, their prevalence across taxa remains poorly characterized. Here we examined the impact of STRs in the genomes of 117,253 organisms spanning the tree of life. We find that there are large differences in the frequencies of STRs between organismal genomes and these differences are largely driven by the taxonomic group an organism belongs to. Using simulated genomes, we find that on average there is no enrichment of STRs in bacterial and archaeal genomes, suggesting that these genomes are not particularly repetitive. In contrast, we find that eukaryotic genomes are orders of magnitude more repetitive than expected. STRs are preferentially located at functional loci at specific taxa. Finally, we utilize the recently completed Telomere-to-Telomere genomes of human and other great apes, and find that STRs are highly abundant and variable between primate species, particularly in peri/centromeric regions. We conclude that STRs have expanded in eukaryotic and viral lineages and not in archaea or bacteria, resulting in large discrepancies in genomic composition.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1101/2024.08.08.604307
Melanie Lemaire, Keaton Warrick Smith, Samantha L Wilson
Infertility impacts up to 17.5% of reproductive-aged couples worldwide. To aid in conception, many couples turn to assisted reproductive technology, such as in vitro fertilization (IVF). IVF can introduce both physical and environmental stressors that may alter DNA methylation regulation, an important and dynamic process during early fetal development. This meta-analysis aims to assess the differences in the placental DNA methylome between spontaneous and IVF pregnancies. We identified three studies from NCBI GEO that measured DNA methylation with an Illumina Infinium Microarray in post-delivery placental tissue from both IVF and spontaneous pregnancies with a total of 575 samples for analysis (n = 96 IVF, n = 479 spontaneous). While there were no significant or differentially methylated CpGs in mixed or female stratified populations, we identified 9 CpGs that reached statistical significance (FDR <0.05) between IVF (n = 56) and spontaneous (n = 238) placentae. 7 autosomal CpGs and 1 X chromosome CpG was hypermethylated and 2 autosomal CpGs were hypomethylated in the IVF placentae compared to spontaneous. Autosomal CpGs closest to LIPJ, EEF1A2, and FBRSL1 also met our criteria to be classified as biologically differentially methylated CpGs (FDR <0.05; δ β|>0.05|). When analyzing variability differences in δβ values between IVF females, IVF males, spontaneous females and spontaneous males, we found a significant shift to greater variability in the both IVF males and females compared to spontaneous (p <2.2e-16, p <2.2e-16). Trends of variability were further analyzed in the biologically differentially methylated autosomal CpGs near LIPJ EEF1A2, and FBRSL1, and while these regions were statistically significant in males, the female δβ and δCoVs followed a similar trend that differed in magnitude. In males and females there was a statistically significant difference in proportions of endothelial cells, hofbauer cells, stromal cells and syncytiotrophoblasts between spontaneous and IVF populations. We also observed significant differences between sex within reproduction type in syncytiotrophoblasts and trophoblasts. The results of this study are critical to further understand the impact of IVF on tissue epigenetics which may help to investigate the connections between IVF and negative pregnancy outcomes. Additionally, our study supports sex specific differences in placental DNA methylation and cell composition should be considered as factors for future placental DNA methylation analyses.
{"title":"Examining Sex-Specific DNA Methylation and Variability Post In Vitro Fertilization","authors":"Melanie Lemaire, Keaton Warrick Smith, Samantha L Wilson","doi":"10.1101/2024.08.08.604307","DOIUrl":"https://doi.org/10.1101/2024.08.08.604307","url":null,"abstract":"Infertility impacts up to 17.5% of reproductive-aged couples worldwide. To aid in conception, many couples turn to assisted reproductive technology, such as in vitro fertilization (IVF). IVF can introduce both physical and environmental stressors that may alter DNA methylation regulation, an important and dynamic process during early fetal development. This meta-analysis aims to assess the differences in the placental DNA methylome between spontaneous and IVF pregnancies. We identified three studies from NCBI GEO that measured DNA methylation with an Illumina Infinium Microarray in post-delivery placental tissue from both IVF and spontaneous pregnancies with a total of 575 samples for analysis (n = 96 IVF, n = 479 spontaneous). While there were no significant or differentially methylated CpGs in mixed or female stratified populations, we identified 9 CpGs that reached statistical significance (FDR <0.05) between IVF (n = 56) and spontaneous (n = 238) placentae. 7 autosomal CpGs and 1 X chromosome CpG was hypermethylated and 2 autosomal CpGs were hypomethylated in the IVF placentae compared to spontaneous. Autosomal CpGs closest to LIPJ, EEF1A2, and FBRSL1 also met our criteria to be classified as biologically differentially methylated CpGs (FDR <0.05; δ β|>0.05|). When analyzing variability differences in δβ values between IVF females, IVF males, spontaneous females and spontaneous males, we found a significant shift to greater variability in the both IVF males and females compared to spontaneous (p <2.2e-16, p <2.2e-16). Trends of variability were further analyzed in the biologically differentially methylated autosomal CpGs near LIPJ EEF1A2, and FBRSL1, and while these regions were statistically significant in males, the female δβ and δCoVs followed a similar trend that differed in magnitude. In males and females there was a statistically significant difference in proportions of endothelial cells, hofbauer cells, stromal cells and syncytiotrophoblasts between spontaneous and IVF populations. We also observed significant differences between sex within reproduction type in syncytiotrophoblasts and trophoblasts. The results of this study are critical to further understand the impact of IVF on tissue epigenetics which may help to investigate the connections between IVF and negative pregnancy outcomes. Additionally, our study supports sex specific differences in placental DNA methylation and cell composition should be considered as factors for future placental DNA methylation analyses.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1101/2024.08.07.606986
Chiara Giannuzzi, Mario Baumgart, Francesco Neri, Alessandro Cellerino
Aging, characterized by a gradual decline in organismal fitness, is the primary risk factor for numerous diseases including cancer, cardiovascular, and neurodegenerative disorders. The inter-individual variability in aging and disease susceptibility has led to the concept of biological age an indirect measure of an individual relative fitness. Epigenetic changes, particularly DNA methylation, have emerged as reliable biomarkers for estimating biological age, leading to the development of predictive models known as epigenetic clocks. Initially created for humans, these clocks have been extended to various mammalian species. Here we set to expand these tools for the short-lived killifish, Nothobranchius furzeri. This species, with its remarkably short lifespan and expression of canonical aging hallmarks, offers a unique model for experimental aging studies. We developed an epigenetic clock for N. furzeri using reduced-representation bisulfite sequencing (RRBS) to analyze DNA methylation in brain and caudal fin tissues across different ages. Our study involved generating comprehensive DNA methylation datasets and employing machine learning to create predictive models based on individual CpG sites and co-methylation modules. These models demonstrated high accuracy in estimating chronological age, with a median absolute error of 3 weeks (7.5% of median lifespan) for a clock based on methylation of individual CpG and 1.5 weeks (3.7% of median lifespan) for an eigenvector-based clock. Our investigation extended to assessing epigenetic age acceleration in different strains and the potential resetting effect of regeneration on fin tissue. Notably, our models indicated that a shorter-lived strain has accelerated epigenetic aging and that regeneration does not reset, but may decelerate epigenetic aging. Additionally, we used longitudinal data to develop an "epigenetic timer" for direct prediction of individual lifespan based on fin biopsies and eigenvector-based method, achieving a median absolute error of 4.5 weeks in the prediction of actual age of death. This surprising result underscores the existence of intrinsic determinants of lifespan established early in life. This study presents the first epigenetic clocks and lifespan predictors for N. furzeri, highlighting their potential as aging biomarkers and sets the stage for future research on life-extending interventions in this model organism.
衰老的特点是机体体能逐渐下降,是包括癌症、心血管疾病和神经退行性疾病在内的多种疾病的主要风险因素。衰老和疾病易感性的个体间差异导致了生物年龄的概念,它是个体相对健康状况的间接衡量标准。表观遗传变化,尤其是 DNA 甲基化,已成为估算生物年龄的可靠生物标志物,并由此开发出被称为表观遗传时钟的预测模型。这些时钟最初是为人类创建的,现在已扩展到各种哺乳动物物种。在这里,我们着手将这些工具扩展到短寿的鳉鱼--毛鳞鳉(Nothobranchius furzeri)。这种鱼的寿命非常短,而且表现出典型的衰老特征,为实验性衰老研究提供了一个独特的模型。我们利用还原-代表性亚硫酸氢盐测序(RRBS)技术开发了一种N. furzeri的表观遗传时钟,用于分析不同年龄段大脑和尾鳍组织中的DNA甲基化情况。我们的研究包括生成全面的DNA甲基化数据集,并利用机器学习创建基于单个CpG位点和共甲基化模块的预测模型。这些模型在估计年代年龄方面表现出很高的准确性,基于单个CpG甲基化的时钟的中位绝对误差为3周(中位寿命的7.5%),而基于特征向量的时钟的中位绝对误差为1.5周(中位寿命的3.7%)。我们的研究扩展到评估不同品系的表观遗传年龄加速以及鳍组织再生的潜在重置效应。值得注意的是,我们的模型表明,寿命较短的品系会加速表观遗传学衰老,而再生不会重置表观遗传学衰老,但可能会减速表观遗传学衰老。此外,我们利用纵向数据开发了一种 "表观遗传计时器",根据鳍活检结果和基于特征向量的方法直接预测个体寿命,预测实际死亡年龄的中位绝对误差为 4.5 周。这项研究首次提出了毛鳞鱼的表观遗传时钟和寿命预测指标,凸显了它们作为衰老生物标志物的潜力,并为今后在这种模式生物中开展延长寿命干预措施的研究奠定了基础。
{"title":"Epigenetic clock and lifespan prediction in the short-lived killifish Nothobranchius furzeri","authors":"Chiara Giannuzzi, Mario Baumgart, Francesco Neri, Alessandro Cellerino","doi":"10.1101/2024.08.07.606986","DOIUrl":"https://doi.org/10.1101/2024.08.07.606986","url":null,"abstract":"Aging, characterized by a gradual decline in organismal fitness, is the primary risk factor for numerous diseases including cancer, cardiovascular, and neurodegenerative disorders. The inter-individual variability in aging and disease susceptibility has led to the concept of biological age an indirect measure of an individual relative fitness. Epigenetic changes, particularly DNA methylation, have emerged as reliable biomarkers for estimating biological age, leading to the development of predictive models known as epigenetic clocks. Initially created for humans, these clocks have been extended to various mammalian species. Here we set to expand these tools for the short-lived killifish, Nothobranchius furzeri. This species, with its remarkably short lifespan and expression of canonical aging hallmarks, offers a unique model for experimental aging studies.\u0000We developed an epigenetic clock for N. furzeri using reduced-representation bisulfite sequencing (RRBS) to analyze DNA methylation in brain and caudal fin tissues across different ages. Our study involved generating comprehensive DNA methylation datasets and employing machine learning to create predictive models based on individual CpG sites and co-methylation modules. These models demonstrated high accuracy in estimating chronological age, with a median absolute error of 3 weeks (7.5% of median lifespan) for a clock based on methylation of individual CpG and 1.5 weeks (3.7% of median lifespan) for an eigenvector-based clock. Our investigation extended to assessing epigenetic age acceleration in different strains and the potential resetting effect of regeneration on fin tissue. Notably, our models indicated that a shorter-lived strain has accelerated epigenetic aging and that regeneration does not reset, but may decelerate epigenetic aging. Additionally, we used longitudinal data to develop an \"epigenetic timer\" for direct prediction of individual lifespan based on fin biopsies and eigenvector-based method, achieving a median absolute error of 4.5 weeks in the prediction of actual age of death. This surprising result underscores the existence of intrinsic determinants of lifespan established early in life.\u0000This study presents the first epigenetic clocks and lifespan predictors for N. furzeri, highlighting their potential as aging biomarkers and sets the stage for future research on life-extending interventions in this model organism.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1101/2024.08.07.606967
Jeffrey Hyacinthe, Guillaume Bourque
Transposable elements (TEs) are DNA sequences able to create copies of themselves within the genome. Despite their limited expression due to silencing, TEs still manage to impact the host genome. For instance, some TEs have been shown to act as cis-regulatory elements and be co-opted in the human genome. This highlights that the contributions of TEs to the host might come from their relationship with the epigenome rather than their expression. However, a systematic analysis that relates TEs in the human genome directly with chromatin histone marks across distinct cell types remains lacking. Here we leverage a new dataset from the International Human Epigenome Consortium with 4867 uniformly processed ChIP-seq experiments for 6 histone marks across 175 annotated cell labels and show that TEs have drastically different enrichments levels across marks. Overall, we find that TEs are generally depleted in H3K9me3 histone modification, except for L1s, while MIRs were highly enriched in H3K4me1, H3K27ac and H3K27me3 and Alus were enriched in H3K36me3. Furthermore, we present a generalised profile of the relationship between TEs enrichment and TE age which reveals a few TE families (Alu, MIR, L2) as diverging from expected dynamics. We also find some significant differences in TE enrichment between cell types and that in 20% of the cases, these enrichments were cell-type specific. We report that at least 4% of cell types with healthy and cancer samples featured significant differences. Notably, we identify 456 TE-Cell Type-histone triplet candidates with the strongest cell-type specific enrichments. We show that many of these candidates are associated with relevant biological processes and genes expressed in the relevant cell type. These results further support a role for TE in genome regulation and highlight novel associations between TEs and histone marks across cell types.
可转座元件(Transposable elements,TEs)是能够在基因组内复制自身的 DNA 序列。尽管可转座元件因沉默而表达有限,但它们仍能对宿主基因组产生影响。例如,一些可转座元件已被证明可作为顺式调控元件在人类基因组中发挥作用。这突出表明,TEs 对宿主的贡献可能来自它们与表观基因组的关系,而不是它们的表达。然而,在不同的细胞类型中,将人类基因组中的TE与染色质组蛋白标记直接联系起来的系统分析仍然缺乏。在这里,我们利用了国际人类表观基因组联盟(International Human Epigenome Consortium)的一个新数据集,该数据集包含 4867 项统一处理的 ChIP-seq 实验,涉及 175 个注释细胞标记中的 6 个组蛋白标记,结果表明 TEs 在不同标记中的富集水平大相径庭。总体而言,我们发现除 L1s 外,TEs 普遍缺乏 H3K9me3 组蛋白修饰,而 MIRs 则高度富集 H3K4me1、H3K27ac 和 H3K27me3,Alus 则富集 H3K36me3。此外,我们还展示了 TEs 富集与 TE 年龄之间关系的一般概况,发现少数 TE 家族(Alu、MIR、L2)与预期的动态不同。我们还发现细胞类型之间的 TE 富集存在一些显著差异,在 20% 的情况下,这些富集具有细胞类型特异性。我们发现至少有 4% 的细胞类型在健康样本和癌症样本中存在显著差异。值得注意的是,我们发现了 456 个具有最强细胞类型特异性富集的 TE 细胞类型组蛋白三元组候选物。我们发现其中许多候选基因与相关生物过程和在相关细胞类型中表达的基因有关。这些结果进一步支持了 TE 在基因组调控中的作用,并强调了 TE 与不同细胞类型组蛋白标记之间的新关联。
{"title":"Transposable elements impact the regulatory landscape through cell type specific epigenomic associations","authors":"Jeffrey Hyacinthe, Guillaume Bourque","doi":"10.1101/2024.08.07.606967","DOIUrl":"https://doi.org/10.1101/2024.08.07.606967","url":null,"abstract":"Transposable elements (TEs) are DNA sequences able to create copies of themselves within the genome. Despite their limited expression due to silencing, TEs still manage to impact the host genome. For instance, some TEs have been shown to act as cis-regulatory elements and be co-opted in the human genome. This highlights that the contributions of TEs to the host might come from their relationship with the epigenome rather than their expression. However, a systematic analysis that relates TEs in the human genome directly with chromatin histone marks across distinct cell types remains lacking. Here we leverage a new dataset from the International Human Epigenome Consortium with 4867 uniformly processed ChIP-seq experiments for 6 histone marks across 175 annotated cell labels and show that TEs have drastically different enrichments levels across marks. Overall, we find that TEs are generally depleted in H3K9me3 histone modification, except for L1s, while MIRs were highly enriched in H3K4me1, H3K27ac and H3K27me3 and Alus were enriched in H3K36me3. Furthermore, we present a generalised profile of the relationship between TEs enrichment and TE age which reveals a few TE families (Alu, MIR, L2) as diverging from expected dynamics. We also find some significant differences in TE enrichment between cell types and that in 20% of the cases, these enrichments were cell-type specific. We report that at least 4% of cell types with healthy and cancer samples featured significant differences. Notably, we identify 456 TE-Cell Type-histone triplet candidates with the strongest cell-type specific enrichments. We show that many of these candidates are associated with relevant biological processes and genes expressed in the relevant cell type. These results further support a role for TE in genome regulation and highlight novel associations between TEs and histone marks across cell types.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1101/2024.08.05.606616
John C. Aldrich, Lauren A. Vanderlinden, Thomas L. Jacobsen, Cheyret Wood, Laura M. Saba, Steven G. Britt
Background An animal’s ability to discriminate between differing wavelengths of light (i.e., color vision) is mediated, in part, by a subset of photoreceptor cells that express opsins with distinct absorption spectra. In Drosophila R7 photoreceptors, expression of the rhodopsin molecules, Rh3 or Rh4, is determined by a stochastic process mediated by the transcription factor spineless. The goal of this study was to identify additional factors that regulate R7 cell fate and opsin choice using a Genome Wide Association Study (GWAS) paired with transcriptome analysis via RNA-Seq.
{"title":"Genome-Wide Association Study and transcriptome analysis reveals a complex gene network that regulates opsin gene expression and cell fate determination in Drosophila R7 photoreceptor cells","authors":"John C. Aldrich, Lauren A. Vanderlinden, Thomas L. Jacobsen, Cheyret Wood, Laura M. Saba, Steven G. Britt","doi":"10.1101/2024.08.05.606616","DOIUrl":"https://doi.org/10.1101/2024.08.05.606616","url":null,"abstract":"<strong>Background</strong> An animal’s ability to discriminate between differing wavelengths of light (i.e., color vision) is mediated, in part, by a subset of photoreceptor cells that express opsins with distinct absorption spectra. In <em>Drosophila</em> R7 photoreceptors, expression of the rhodopsin molecules, Rh3 or Rh4, is determined by a stochastic process mediated by the transcription factor <em>spineless</em>. The goal of this study was to identify additional factors that regulate R7 cell fate and opsin choice using a Genome Wide Association Study (GWAS) paired with transcriptome analysis via RNA-Seq.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1101/2024.08.07.606955
Jae Woo Baek, Songwon Lim, Nayeon Park, Byeongsop Song, Nikhil Kirtipal, Jens Nielsen, Adil Mardinoglu, Saeed Shoaie, Jae-il Kim, Jang Won Son, Ara Koh, Sunjae Lee
In recent years, the overuse of antibiotics has led to the emergence of antimicrobial resistant (AMR) bacteria. To evaluate the spread of AMR bacteria, the reservoir of AMR genes (resistome) has traditionally been identified from environmental samples, hospital environments, and human populations; however, the functional role of AMR bacteria in the human gut microbiome and their persistency within individuals has not been fully investigated. Here, we performed a strain-resolved in-depth analysis of the resistome changes by reconstructing a large number of metagenome-assembled genomes (MAGs) of antibiotics- treated individual’s gut microbiome. Interestingly, we identified two bacterial populations with different resistome profiles, extensively acquired antimicrobial resistant bacteria (EARB) and sporadically acquired antimicrobial resistant bacteria (SARB), and found that EARB showed broader drug resistance and a significant functional role in shaping individual microbiome composition after antibiotic treatment. Furthermore, longitudinal strain analysis revealed that EARB bacteria were inherently carried by individuals and can reemerge through strain switching in the human gut microbiome. Our data on the presence of AMR bacteria in the human gut microbiome provides a new avenue for controlling the spread of AMR bacteria in the human community.
近年来,抗生素的过度使用导致了抗菌素耐药性(AMR)细菌的出现。为了评估 AMR 细菌的传播情况,传统上从环境样本、医院环境和人类群体中鉴定 AMR 基因库(耐药性基因组);然而,AMR 细菌在人类肠道微生物组中的功能作用及其在个体中的持久性尚未得到充分研究。在这里,我们通过重建抗生素治疗个体肠道微生物组的大量元基因组组装基因组(MAGs),对耐药性组的变化进行了菌株分辨的深入分析。有趣的是,我们发现了两种具有不同抗性谱的细菌群,即广泛获得性抗菌素耐药菌(EARB)和零星获得性抗菌素耐药菌(SARB),并发现 EARB 表现出更广泛的耐药性,而且在抗生素治疗后塑造个体微生物组组成方面具有重要的功能作用。此外,纵向菌株分析表明,EARB 细菌是个体固有携带的细菌,可通过菌株转换在人体肠道微生物组中重新出现。我们关于人类肠道微生物群中存在 AMR 细菌的数据为控制 AMR 细菌在人类群落中的传播提供了一条新途径。
{"title":"Extensively acquired antimicrobial resistant bacteria restructure the individual microbial community in post-antibiotic conditions","authors":"Jae Woo Baek, Songwon Lim, Nayeon Park, Byeongsop Song, Nikhil Kirtipal, Jens Nielsen, Adil Mardinoglu, Saeed Shoaie, Jae-il Kim, Jang Won Son, Ara Koh, Sunjae Lee","doi":"10.1101/2024.08.07.606955","DOIUrl":"https://doi.org/10.1101/2024.08.07.606955","url":null,"abstract":"In recent years, the overuse of antibiotics has led to the emergence of antimicrobial resistant (AMR) bacteria. To evaluate the spread of AMR bacteria, the reservoir of AMR genes (resistome) has traditionally been identified from environmental samples, hospital environments, and human populations; however, the functional role of AMR bacteria in the human gut microbiome and their persistency within individuals has not been fully investigated. Here, we performed a strain-resolved in-depth analysis of the resistome changes by reconstructing a large number of metagenome-assembled genomes (MAGs) of antibiotics- treated individual’s gut microbiome. Interestingly, we identified two bacterial populations with different resistome profiles, extensively acquired antimicrobial resistant bacteria (EARB) and sporadically acquired antimicrobial resistant bacteria (SARB), and found that EARB showed broader drug resistance and a significant functional role in shaping individual microbiome composition after antibiotic treatment. Furthermore, longitudinal strain analysis revealed that EARB bacteria were inherently carried by individuals and can reemerge through strain switching in the human gut microbiome. Our data on the presence of AMR bacteria in the human gut microbiome provides a new avenue for controlling the spread of AMR bacteria in the human community.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1101/2024.08.05.606465
Chenlei Hu, Mehdi Borji, Giovanni J. Marrero, Vipin Kumar, Jackson A. Weir, Sachin V. Kammula, Evan Z. Macosko, Fei Chen
Tissue organization arises from the coordinated molecular programs of cells. Spatial genomics maps cells and their molecular programs within the spatial context of tissues. However, current methods measure spatial information through imaging or direct registration, which often require specialized equipment and are limited in scale. Here, we developed an imaging-free spatial transcriptomics method that uses molecular diffusion patterns to computationally reconstruct spatial data. To do so, we utilize a simple experimental protocol on two dimensional barcode arrays to establish an interaction network between barcodes via molecular diffusion. Sequencing these interactions generates a high dimensional matrix of interactions between different spatial barcodes. Then, we perform dimensionality reduction to regenerate a two-dimensional manifold, which represents the spatial locations of the barcode arrays. Surprisingly, we found that the UMAP algorithm, with minimal modifications can faithfully successfully reconstruct the arrays. We demonstrated that this method is compatible with capture array based spatial transcriptomics/genomics methods, Slide-seq and Slide-tags, with high fidelity. We systematically explore the fidelity of the reconstruction through comparisons with experimentally derived ground truth data, and demonstrate that reconstruction generates high quality spatial genomics data. We also scaled this technique to reconstruct high-resolution spatial information over areas up to 1.2 centimeters. This computational reconstruction method effectively converts spatial genomics measurements to molecular biology, enabling spatial transcriptomics with high accessibility, and scalability.
{"title":"Scalable imaging-free spatial genomics through computational reconstruction","authors":"Chenlei Hu, Mehdi Borji, Giovanni J. Marrero, Vipin Kumar, Jackson A. Weir, Sachin V. Kammula, Evan Z. Macosko, Fei Chen","doi":"10.1101/2024.08.05.606465","DOIUrl":"https://doi.org/10.1101/2024.08.05.606465","url":null,"abstract":"Tissue organization arises from the coordinated molecular programs of cells. Spatial genomics maps cells and their molecular programs within the spatial context of tissues. However, current methods measure spatial information through imaging or direct registration, which often require specialized equipment and are limited in scale. Here, we developed an imaging-free spatial transcriptomics method that uses molecular diffusion patterns to computationally reconstruct spatial data. To do so, we utilize a simple experimental protocol on two dimensional barcode arrays to establish an interaction network between barcodes via molecular diffusion. Sequencing these interactions generates a high dimensional matrix of interactions between different spatial barcodes. Then, we perform dimensionality reduction to regenerate a two-dimensional manifold, which represents the spatial locations of the barcode arrays. Surprisingly, we found that the UMAP algorithm, with minimal modifications can faithfully successfully reconstruct the arrays. We demonstrated that this method is compatible with capture array based spatial transcriptomics/genomics methods, Slide-seq and Slide-tags, with high fidelity. We systematically explore the fidelity of the reconstruction through comparisons with experimentally derived ground truth data, and demonstrate that reconstruction generates high quality spatial genomics data. We also scaled this technique to reconstruct high-resolution spatial information over areas up to 1.2 centimeters. This computational reconstruction method effectively converts spatial genomics measurements to molecular biology, enabling spatial transcriptomics with high accessibility, and scalability.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1101/2024.08.07.607012
Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay
Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. Using this framework, we generated genomic proxy datasets for very diverse human populations around the world. We compared the quality of AGs generated by our approach with AGs generated by the established models and report improvements in capturing population structure, linkage disequilibrium, and metrics related to privacy leakage. Furthermore, we developed a frugal model with orders of magnitude fewer parameters and comparable performance to larger models. For quality assessment, we also implemented a new evaluation metric based on information theory to measure local haplotypic diversity, showing that generative models yield higher diversity than real genomes. In addition, we addressed the shrinkage issue associated with PCA and generative modeling, examined its relation to the nearest neighbor resemblance metric, and proposed a resolution. Finally, we evaluated the effect of different binarization methods on the quality of the output AGs.
通过生成模型生成合成数据最近已成为基因组学的一个重要研究领域,其应用范围从功能序列设计到高质量、保护隐私的人工硅学基因组。在利用原始基因组输入训练的各种生成模型创建人工基因组(AGs)的大量工作之后,我们提出了一种概念上不同的方法,以解决高维度基因组数据生成的可扩展性和复杂性问题。我们的方法结合了通过主成分分析(PCA)实现的降维和在降维空间中学习的生成对抗网络(GAN)。利用这一框架,我们生成了世界各地不同人类群体的基因组代理数据集。我们将我们的方法生成的 AGs 的质量与现有模型生成的 AGs 的质量进行了比较,并报告了在捕捉种群结构、连锁不平衡和隐私泄露相关指标方面的改进。此外,我们还开发了一种节俭型模型,其参数数量少,性能与大型模型相当。在质量评估方面,我们还采用了一种基于信息论的新评估指标来衡量局部单倍型多样性,结果表明生成模型产生的多样性高于真实基因组。此外,我们还解决了与 PCA 和生成模型相关的收缩问题,研究了其与近邻相似度指标的关系,并提出了解决方法。最后,我们评估了不同二值化方法对输出 AG 质量的影响。
{"title":"Latent generative modeling of long genetic sequences with GANs","authors":"Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay","doi":"10.1101/2024.08.07.607012","DOIUrl":"https://doi.org/10.1101/2024.08.07.607012","url":null,"abstract":"Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. Using this framework, we generated genomic proxy datasets for very diverse human populations around the world. We compared the quality of AGs generated by our approach with AGs generated by the established models and report improvements in capturing population structure, linkage disequilibrium, and metrics related to privacy leakage. Furthermore, we developed a frugal model with orders of magnitude fewer parameters and comparable performance to larger models. For quality assessment, we also implemented a new evaluation metric based on information theory to measure local haplotypic diversity, showing that generative models yield higher diversity than real genomes. In addition, we addressed the shrinkage issue associated with PCA and generative modeling, examined its relation to the nearest neighbor resemblance metric, and proposed a resolution. Finally, we evaluated the effect of different binarization methods on the quality of the output AGs.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}