Pub Date : 2024-09-19DOI: 10.1101/2024.09.18.613467
Andrea R Watson, Wei-An Chen, Willem van Schaik, Jason M Norman
In 2016, the US Food and Drug Administration published guidance for the early development of live biotherapeutic products (LBPs). Of particular importance is the characterization of LBP strains and the potential transfer of antimicrobial resistance (AMR) genes to relevant microbial organisms in the recipients microbiota. Van der Lelie et al, make unsupported claims that the LBP strain VE202-06 encodes a transferable vancomycin resistance element. Here we provide our analysis of the potential transfer of AMR by strain VE202-06. These data indicate that strain VE202-06 has no risk of transferring AMR to relevant microbial organisms.
2016 年,美国食品和药物管理局发布了活生物治疗产品(LBPs)早期开发指南。其中尤为重要的是枸杞多糖菌株的特性以及抗菌药耐药性(AMR)基因向受体微生物群中相关微生物的潜在转移。Van der Lelie 等人声称枸杞多糖菌株 VE202-06 编码可转移的万古霉素抗性元件,但这一说法没有得到证实。在此,我们对 VE202-06 菌株可能转移 AMR 的情况进行了分析。这些数据表明,菌株 VE202-06 没有向相关微生物转移 AMR 的风险。
{"title":"Genomic Analyses Suggest No Risk of Vancomycin Resistance Transfer by Strain VE202-06","authors":"Andrea R Watson, Wei-An Chen, Willem van Schaik, Jason M Norman","doi":"10.1101/2024.09.18.613467","DOIUrl":"https://doi.org/10.1101/2024.09.18.613467","url":null,"abstract":"In 2016, the US Food and Drug Administration published guidance for the early development of live biotherapeutic products (LBPs). Of particular importance is the characterization of LBP strains and the potential transfer of antimicrobial resistance (AMR) genes to relevant microbial organisms in the recipients microbiota. Van der Lelie et al, make unsupported claims that the LBP strain VE202-06 encodes a transferable vancomycin resistance element. Here we provide our analysis of the potential transfer of AMR by strain VE202-06. These data indicate that strain VE202-06 has no risk of transferring AMR to relevant microbial organisms.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1101/2024.09.16.613340
Polina Tikanova, James Julian Ross, Andreas Hagmueller, Florian Puehringer, Pinelopi Pliota, Daniel Krogull, Valeria Stefania, Manuel Hunold, Alevtina Koreshova, Anja Koller, Ivanna Ostapchuk, Jacqueline Okweri, Joseph Gokcezade, Peter Duchek, Gang Dong, Eyal Ben-David, Alejandro Burga
To sustain life, molecular complexes require the concerted action of multiple proteins, each relying on one another to perform intricate tasks. However, how such interdependent protein interactions evolve in the first place is poorly understood. To address this, we investigated the origins of a group of fast-evolving genetic parasites, toxin-antidote elements, which boil down this dilemma to a simple question: what came first, the toxin or the antidote? By integrating quantitative genetics, biochemistry, and evolutionary genomics, we discovered that toxins and antidotes can arise simultaneously through the duplication of a regulatory module comprising an F-box protein in linkage to its substrate. Our findings provide one solution to the recurrent emergence of mutual dependence in protein complexes and illustrate in detail how complexity can swiftly arise from simplicity.
{"title":"A regulatory module driving the recurrent evolution of irreducible molecular complexes.","authors":"Polina Tikanova, James Julian Ross, Andreas Hagmueller, Florian Puehringer, Pinelopi Pliota, Daniel Krogull, Valeria Stefania, Manuel Hunold, Alevtina Koreshova, Anja Koller, Ivanna Ostapchuk, Jacqueline Okweri, Joseph Gokcezade, Peter Duchek, Gang Dong, Eyal Ben-David, Alejandro Burga","doi":"10.1101/2024.09.16.613340","DOIUrl":"https://doi.org/10.1101/2024.09.16.613340","url":null,"abstract":"To sustain life, molecular complexes require the concerted action of multiple proteins, each relying on one another to perform intricate tasks. However, how such interdependent protein interactions evolve in the first place is poorly understood. To address this, we investigated the origins of a group of fast-evolving genetic parasites, toxin-antidote elements, which boil down this dilemma to a simple question: what came first, the toxin or the antidote? By integrating quantitative genetics, biochemistry, and evolutionary genomics, we discovered that toxins and antidotes can arise simultaneously through the duplication of a regulatory module comprising an F-box protein in linkage to its substrate. Our findings provide one solution to the recurrent emergence of mutual dependence in protein complexes and illustrate in detail how complexity can swiftly arise from simplicity.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1101/2024.09.16.613381
Hyungtai Sim, Geun-Ho Park, Woong-Yang Park, Se-Hoon Lee, Murim Choi
Background: While immune checkpoint inhibitors (ICIs) are adopted as standard therapy in non-small cell lung cancer (NSCLC) patients, factors that influence variable prognosis still remain elusive. Therefore, a deeper understanding is needed of how germline variants regulate the transcriptomes of circulating immune cells in metastasis, and ultimately influence immunotherapy outcomes. Methods: We collected peripheral blood mononuclear cells (PBMCs) from 73 ICI-treated NSCLC patients, conducted single-cell RNA sequencing, and called germline variants via SNP microarray. Determination of expression quantitative trait loci (eQTL) allows elucidating genetic interactions between germline variants and gene expression. Utilizing aggregation-based eQTL mapping and network analysis across eight blood cell types, we sought cell-type-specific and ICI-prognosis-dependent gene regulatory signatures. Results: Our sc-eQTL analysis identified 3,616 blood- and 702 lung-cancer-specific eGenes across eight major clusters and treatment conditions, highlighting involvement of immune-related pathways. Network analysis revealed TBX21-EOMES regulons activity in CD8+ T cells and the enrichment of eQTLs in higher-centrality genes as predictive factors of ICI response. Conclusions: Our findings suggest that in the circulating immune cells of NSCLC patients, transcriptomic regulation differs in a cell type- and treatment-specific manner. They further highlight the role of eQTL loci as broad controllers of ICI-prognosis-predicting gene networks. The predictive networks and identification of eQTL contributions can lead to deeper understanding and personalized ICI therapy response prediction based on germline variants.
{"title":"Systemic CD8+ T cell effector signature predicts prognosis of lung cancer immunotherapy","authors":"Hyungtai Sim, Geun-Ho Park, Woong-Yang Park, Se-Hoon Lee, Murim Choi","doi":"10.1101/2024.09.16.613381","DOIUrl":"https://doi.org/10.1101/2024.09.16.613381","url":null,"abstract":"Background: While immune checkpoint inhibitors (ICIs) are adopted as standard therapy in non-small cell lung cancer (NSCLC) patients, factors that influence variable prognosis still remain elusive. Therefore, a deeper understanding is needed of how germline variants regulate the transcriptomes of circulating immune cells in metastasis, and ultimately influence immunotherapy outcomes. Methods: We collected peripheral blood mononuclear cells (PBMCs) from 73 ICI-treated NSCLC patients, conducted single-cell RNA sequencing, and called germline variants via SNP microarray. Determination of expression quantitative trait loci (eQTL) allows elucidating genetic interactions between germline variants and gene expression. Utilizing aggregation-based eQTL mapping and network analysis across eight blood cell types, we sought cell-type-specific and ICI-prognosis-dependent gene regulatory signatures. Results: Our sc-eQTL analysis identified 3,616 blood- and 702 lung-cancer-specific eGenes across eight major clusters and treatment conditions, highlighting involvement of immune-related pathways. Network analysis revealed TBX21-EOMES regulons activity in CD8+ T cells and the enrichment of eQTLs in higher-centrality genes as predictive factors of ICI response. Conclusions: Our findings suggest that in the circulating immune cells of NSCLC patients, transcriptomic regulation differs in a cell type- and treatment-specific manner. They further highlight the role of eQTL loci as broad controllers of ICI-prognosis-predicting gene networks. The predictive networks and identification of eQTL contributions can lead to deeper understanding and personalized ICI therapy response prediction based on germline variants.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1101/2024.09.13.612826
Peter Z Schall, Jennifer R S Meadows, Fabian Ramos-Almodovar, Jeffrey M Kidd
Background: The presence of mitochondrial sequences in the nuclear genome (Numts) confounds analyses of mitochondrial sequence variation and is a potential source of false positives in disease studies. To improve the analysis of mitochondrial variation in canines, we completed a systematic assessment of Numt content across genome assemblies, canine populations and the carnivore lineage. Results: Centering our analysis on the UU_Cfam_GSD_1.0/canFam4/Mischka assembly, a commonly used reference in dog genetic variation studies, we find a total of 321 Numts, located throughout the nuclear genome and encompassing the entire sequence of the mitochondria. Comparison to 14 canine genome assemblies identified 63 Numts with presence-absence dimorphism among dogs, wolves, and a coyote. Further, a subset of Numts were maintained across carnivore evolutionary time (arctic fox, polar bear, cat), with 8 sequences likely more than 10 million years old, and shared with the domestic cat. On a population level, using structural variant data from the Dog10K Consortium for 1,879 dogs and wolves, we identified 11 Numts that are absent in at least one sample as well as 53 Numts that are absent from the Mischka assembly. Conclusions: We highlight scenarios where the presence of Numts is a potentially confounding factor and provide an annotation of these sequences in canine genome assemblies. This resource will aid the identification and interpretation of polymorphisms in both somatic and germline mitochondrial studies in canines.
{"title":"Characterization of nuclear mitochondrial insertions in canine genome assemblies","authors":"Peter Z Schall, Jennifer R S Meadows, Fabian Ramos-Almodovar, Jeffrey M Kidd","doi":"10.1101/2024.09.13.612826","DOIUrl":"https://doi.org/10.1101/2024.09.13.612826","url":null,"abstract":"Background: The presence of mitochondrial sequences in the nuclear genome (Numts) confounds analyses of mitochondrial sequence variation and is a potential source of false positives in disease studies. To improve the analysis of mitochondrial variation in canines, we completed a systematic assessment of Numt content across genome assemblies, canine populations and the carnivore lineage. Results: Centering our analysis on the UU_Cfam_GSD_1.0/canFam4/Mischka assembly, a commonly used reference in dog genetic variation studies, we find a total of 321 Numts, located throughout the nuclear genome and encompassing the entire sequence of the mitochondria. Comparison to 14 canine genome assemblies identified 63 Numts with presence-absence dimorphism among dogs, wolves, and a coyote. Further, a subset of Numts were maintained across carnivore evolutionary time (arctic fox, polar bear, cat), with 8 sequences likely more than 10 million years old, and shared with the domestic cat. On a population level, using structural variant data from the Dog10K Consortium for 1,879 dogs and wolves, we identified 11 Numts that are absent in at least one sample as well as 53 Numts that are absent from the Mischka assembly. Conclusions: We highlight scenarios where the presence of Numts is a potentially confounding factor and provide an annotation of these sequences in canine genome assemblies. This resource will aid the identification and interpretation of polymorphisms in both somatic and germline mitochondrial studies in canines.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1101/2024.09.12.612681
Silvia Minana-Posada, Cecile Lorrain, Bruce A. McDonald, Alice Feurtey
Adaptation to new climates poses a significant challenge for plant pathogens during range expansion, highlighting the importance of understanding their response to climate to accurately forecast future disease outbreaks. The wheat pathogen Zymoseptoria tritici is ubiquitous across most wheat production regions distributed across diverse climate zones. We explored the genetic architecture of thermal adaptation using a global collection of 411 Z. tritici strains that were phenotyped across a wide range of temperatures and then included in a genome-wide association study. Our analyses provided evidence for local thermal adaptation in Z. tritici populations worldwide, with a significant positive correlation between bioclimatic variables and optimal growth temperatures. We also found a high variability in thermal performance among Z. tritici strains coming from the same field populations, reflecting the high evolutionary potential of this pathogen at the field scale. We identified 69 genes putatively involved in thermal adaptation, including one high-confidence candidate potentially involved in cold adaptation. These results highlight the complex polygenic nature of thermal adaptation in Z. tritici and suggest that this pathogen is likely to adapt well when confronted with climate change.
适应新的气候是植物病原体在扩大范围过程中面临的一个重大挑战,这凸显了了解病原体对气候的反应以准确预测未来病害爆发的重要性。小麦病原体 Zymoseptoria tritici 在大多数小麦产区无处不在,分布在不同的气候带。我们利用全球收集的 411 株 Z. tritici 菌株探索了热适应的遗传结构,这些菌株在广泛的温度范围内进行了表型分析,然后纳入了全基因组关联研究。我们的分析为全球 Z. tritici 种群的局部热适应提供了证据,生物气候变量与最适生长温度之间存在显著的正相关。我们还发现,来自同一田间种群的 Z. tritici 菌株之间的热性能差异很大,这反映了这种病原体在田间规模上的高进化潜力。我们发现了 69 个可能参与热适应的基因,包括一个可能参与冷适应的高置信度候选基因。这些结果凸显了 Z. tritici 热适应的复杂多基因性质,并表明这种病原体在面对气候变化时可能会适应良好。
{"title":"Thermal adaptation in worldwide collections of a major fungal pathogen","authors":"Silvia Minana-Posada, Cecile Lorrain, Bruce A. McDonald, Alice Feurtey","doi":"10.1101/2024.09.12.612681","DOIUrl":"https://doi.org/10.1101/2024.09.12.612681","url":null,"abstract":"Adaptation to new climates poses a significant challenge for plant pathogens during range expansion, highlighting the importance of understanding their response to climate to accurately forecast future disease outbreaks. The wheat pathogen Zymoseptoria tritici is ubiquitous across most wheat production regions distributed across diverse climate zones. We explored the genetic architecture of thermal adaptation using a global collection of 411 Z. tritici strains that were phenotyped across a wide range of temperatures and then included in a genome-wide association study. Our analyses provided evidence for local thermal adaptation in Z. tritici populations worldwide, with a significant positive correlation between bioclimatic variables and optimal growth temperatures. We also found a high variability in thermal performance among Z. tritici strains coming from the same field populations, reflecting the high evolutionary potential of this pathogen at the field scale. We identified 69 genes putatively involved in thermal adaptation, including one high-confidence candidate potentially involved in cold adaptation. These results highlight the complex polygenic nature of thermal adaptation in Z. tritici and suggest that this pathogen is likely to adapt well when confronted with climate change.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142260806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1101/2024.09.09.612110
Chirag Nepal, Wanqiu Chen, Zhong Chen, John A Wroble, Ling Xie, Wenjing Liao, Chunlin Xiao, Andrew Farmer, Malcolm Moos, Wendell Jones, Xian Chen, Charles Wang
A variety of newly developed next-generation sequencing technologies are making their way rapidly into the research and clinical applications, for which accuracy and cross-lab reproducibility are critical, and reference standards are much needed. Our previous multicenter studies under the SEQC-2 umbrella using a breast cancer cell line with paired B-cell line have produced a large amount of different genomic data including whole genome sequencing (Illumina, PacBio, Nanopore), HiC, and scRNA-seq with detailed analyses on somatic mutations, single-nucleotide variations (SNVs), and structural variations (SVs). However, there is still a lack of well-characterized reference materials which include epigenomic and proteomic data. Here we further performed ATAC-seq, Methyl-seq, RNA-seq, and proteomic analyses and provided a comprehensive catalog of the epigenomic landscape, which overlapped with the transcriptomes and proteomes for the two cell lines. We identified >7,700 peptide isoforms, where the majority (95%) of the genes had a single peptide isoform. Protein expression of the transcripts overlapping CGIs were much higher than the protein expression of the non-CGI transcripts in both cell lines. We further demonstrated the evidence that certain SNVs were incorporated into mutated peptides. We observed that open chromatin regions had low methylation which were largely regulated by CG density, where CG-rich regions had more accessible chromatin, low methylation, and higher gene and protein expression. The CG-poor regions had higher repressive epigenetic regulations (higher DNA methylation) and less open chromatin, resulting in a cell line specific methylation and gene expression patterns. Our studies provide well-defined reference materials consisting of two cell lines with genomic, epigenomic, transcriptomic, scRNA-seq and proteomic characterizations which can serve as standards for validating and benchmarking not only on various omics assays, but also on bioinformatics methods. It will be a valuable resource for both research and clinical communities.
{"title":"Epigenomic, transcriptomic and proteomic characterizations of reference samples","authors":"Chirag Nepal, Wanqiu Chen, Zhong Chen, John A Wroble, Ling Xie, Wenjing Liao, Chunlin Xiao, Andrew Farmer, Malcolm Moos, Wendell Jones, Xian Chen, Charles Wang","doi":"10.1101/2024.09.09.612110","DOIUrl":"https://doi.org/10.1101/2024.09.09.612110","url":null,"abstract":"A variety of newly developed next-generation sequencing technologies are making their way rapidly into the research and clinical applications, for which accuracy and cross-lab reproducibility are critical, and reference standards are much needed. Our previous multicenter studies under the SEQC-2 umbrella using a breast cancer cell line with paired B-cell line have produced a large amount of different genomic data including whole genome sequencing (Illumina, PacBio, Nanopore), HiC, and scRNA-seq with detailed analyses on somatic mutations, single-nucleotide variations (SNVs), and structural variations (SVs). However, there is still a lack of well-characterized reference materials which include epigenomic and proteomic data. Here we further performed ATAC-seq, Methyl-seq, RNA-seq, and proteomic analyses and provided a comprehensive catalog of the epigenomic landscape, which overlapped with the transcriptomes and proteomes for the two cell lines. We identified >7,700 peptide isoforms, where the majority (95%) of the genes had a single peptide isoform. Protein expression of the transcripts overlapping CGIs were much higher than the protein expression of the non-CGI transcripts in both cell lines. We further demonstrated the evidence that certain SNVs were incorporated into mutated peptides. We observed that open chromatin regions had low methylation which were largely regulated by CG density, where CG-rich regions had more accessible chromatin, low methylation, and higher gene and protein expression. The CG-poor regions had higher repressive epigenetic regulations (higher DNA methylation) and less open chromatin, resulting in a cell line specific methylation and gene expression patterns. Our studies provide well-defined reference materials consisting of two cell lines with genomic, epigenomic, transcriptomic, scRNA-seq and proteomic characterizations which can serve as standards for validating and benchmarking not only on various omics assays, but also on bioinformatics methods. It will be a valuable resource for both research and clinical communities.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142260805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1101/2024.09.12.612639
Arnab Kumar Khan, Tanushree Haldar, Arunabha Majumdar
Transcriptome-wide association study (TWAS) has shed light on molecular mechanisms by examining the roles of genes in complex disease etiology. TWAS facilitates gene expression mapping studies based on a reference panel of transcriptomic data to build a prediction model to identify expression quantitative loci (eQTLs) affecting gene expressions. These eQTLs leverage the construction of genetically regulated gene expression (GReX) in the GWAS data and a test between imputed GReX and the trait indicates gene-trait association. Such a two-step approach ignores the uncertainty of the predicted expression and can lead to reduced inference accuracy, e.g., inflated type-I error in TWAS. To circumvent a two-step approach, we develop a unified Bayesian method for TWAS, combining the two datasets simultaneously. We consider the horseshoe prior in the transcriptome data while modeling the relationship between the gene expression and local SNPs and the spike and slab prior while testing for an association between the GReX and the trait. We extend our approach to conducting a multi-ancestry TWAS, focusing on discovering genes that affect the trait in all ancestries. We have shown through simulation that our method gives better estimation accuracy for GReX effect size than other methods. In real data, applying our method to the GEUVADIS expression study and the GWAS data from the UK Biobank revealed several novel genes associated with the trait body mass index (BMI).
全转录组关联研究(TWAS)通过研究基因在复杂疾病病因学中的作用,揭示了分子机制。TWAS 以转录组数据参考面板为基础,促进基因表达图谱研究,从而建立预测模型,确定影响基因表达的表达定量位点(eQTL)。这些 eQTLs 可利用 GWAS 数据中的基因调控基因表达(GReX)构建,并通过推算 GReX 与性状之间的检验表明基因与性状之间的关联。这种两步法忽略了预测表达的不确定性,可能导致推断准确性降低,例如 TWAS 中的 I 型误差增大。为了避免两步法,我们为 TWAS 开发了一种统一的贝叶斯方法,同时结合两个数据集。我们考虑了转录组数据中的马蹄先验,同时为基因表达和局部 SNP 之间的关系建模;还考虑了尖峰先验和板块先验,同时测试 GReX 和性状之间的关联。我们将我们的方法扩展到了多祖先 TWAS,重点是发现影响所有祖先性状的基因。我们通过模拟证明,与其他方法相比,我们的方法对 GReX 效应大小的估计精度更高。在真实数据中,将我们的方法应用于 GEUVADIS 表达研究和英国生物库的 GWAS 数据,发现了几个与体重指数(BMI)性状相关的新基因。
{"title":"A unified Bayesian approach to transcriptome-wide association study","authors":"Arnab Kumar Khan, Tanushree Haldar, Arunabha Majumdar","doi":"10.1101/2024.09.12.612639","DOIUrl":"https://doi.org/10.1101/2024.09.12.612639","url":null,"abstract":"Transcriptome-wide association study (TWAS) has shed light on molecular mechanisms by examining the roles of genes in complex disease etiology. TWAS facilitates gene expression mapping studies based on a reference panel of transcriptomic data to build a prediction model to identify expression quantitative loci (eQTLs) affecting gene expressions. These eQTLs leverage the construction of genetically regulated gene expression (GReX) in the GWAS data and a test between imputed GReX and the trait indicates gene-trait association. Such a two-step approach ignores the uncertainty of the predicted expression and can lead to reduced inference accuracy, e.g., inflated type-I error in TWAS. To circumvent a two-step approach, we develop a unified Bayesian method for TWAS, combining the two datasets simultaneously. We consider the horseshoe prior in the transcriptome data while modeling the relationship between the gene expression and local SNPs and the spike and slab prior while testing for an association between the GReX and the trait. We extend our approach to conducting a multi-ancestry TWAS, focusing on discovering genes that affect the trait in all ancestries. We have shown through simulation that our method gives better estimation accuracy for GReX effect size than other methods. In real data, applying our method to the GEUVADIS expression study and the GWAS data from the UK Biobank revealed several novel genes associated with the trait body mass index (BMI).","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142260807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1101/2024.08.27.607709
Kouhei Toga, Fumiko Kimoto, Hiroki Fujii, Hidemasa Bono
Insecticide resistance in the bedbug Cimex lectularius is poorly understood due to the lack of genome sequences for resistant strains. In Japan, we identified a resistant strain of C. lectularius that exhibits a higher pyrethroid resistance ratio compared to many previously discovered strains. We sequenced genomes of the pyrethroid resistant and susceptible strains using long-read sequencing, resulting in the construction of highly contiguous genomes (N50 of resistant strain: 2.1Mb and N50 of susceptible strain: 1.5 Mb). Gene prediction was performed by BRAKER3 and Functional annotation was performed by Fanflow4insects workflow. Next, we compared their amino acid sequences to identify gene mutations, identifying 729 mutated transcripts that were specific to the resistant strain. Among them, those defined previously as resistance genes were included. Additionally, enrichment analysis implicated DNA damage response, cell cycle regulation, insulin metabolism, and lysosomes in the development of pyrethroid resistance. Genome editing of these genes can provide insights into the evolution and mechanisms of insecticide resistance. This study expanded the target genes to monitor allele distribution and frequency changes, which will likely contribute to the assessment of resistance levels. These findings highlight the potential of genome-wide approaches to understand insecticide resistance in bed bugs.
{"title":"Genome-wide Search for Gene Mutations likely Conferring Insecticide Resistance in the Common Bed Bug, Cimex lectularius","authors":"Kouhei Toga, Fumiko Kimoto, Hiroki Fujii, Hidemasa Bono","doi":"10.1101/2024.08.27.607709","DOIUrl":"https://doi.org/10.1101/2024.08.27.607709","url":null,"abstract":"Insecticide resistance in the bedbug <em>Cimex lectularius</em> is poorly understood due to the lack of genome sequences for resistant strains. In Japan, we identified a resistant strain of <em>C. lectularius</em> that exhibits a higher pyrethroid resistance ratio compared to many previously discovered strains. We sequenced genomes of the pyrethroid resistant and susceptible strains using long-read sequencing, resulting in the construction of highly contiguous genomes (N50 of resistant strain: 2.1Mb and N50 of susceptible strain: 1.5 Mb). Gene prediction was performed by BRAKER3 and Functional annotation was performed by Fanflow4insects workflow. Next, we compared their amino acid sequences to identify gene mutations, identifying 729 mutated transcripts that were specific to the resistant strain. Among them, those defined previously as resistance genes were included. Additionally, enrichment analysis implicated DNA damage response, cell cycle regulation, insulin metabolism, and lysosomes in the development of pyrethroid resistance. Genome editing of these genes can provide insights into the evolution and mechanisms of insecticide resistance. This study expanded the target genes to monitor allele distribution and frequency changes, which will likely contribute to the assessment of resistance levels. These findings highlight the potential of genome-wide approaches to understand insecticide resistance in bed bugs.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142260808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1101/2024.09.12.612544
Xavier Roca-Rada, Roberta Davidson, Matthew P. Williams, Shyamsundar Ravishankar, Evelyn Collen, Christian Haarkotter, Leonard Taufik, Antonio Faustino Carvalho, Vanessa Villalba-Mouco, Daniel R. Cuesta-Aguirre, Catarina Tente, Alvaro M. Monge Calleja, Rebecca Anne MacRoberts, Linda Melo, Gludhug A. Purnomo, Yassine Souilmi, Raymond Tobler, Eugenia Cunha, Sofia Tereso, Vitor M. J. Matos, Teresa Matos Fernandes, Anne-France Mauer, Ana Maria Silva, Pedro C. Carvalho, Bastien Llamas, Joao C. Teixeira
Background: Recent ancient DNA studies uncovering large-scale demographic events in Iberia have focused primarily on Spain, with limited reports for Portugal, a country located at the westernmost edge of continental Eurasia. Here, we introduce the largest collection of ancient Portuguese genomic datasets (n = 68) to date, spanning 5,000 years, from the Neolithic to the 19th century. Results: We found evidence of patrilocality in Neolithic Portugal, with admixture from local hunter-gatherers and Anatolian farmers, and persistence of Upper Paleolithic Magdalenian ancestry. This genetic profile persists into the Chalcolithic, reflecting diverse local hunter-gatherer contributions. During the Bronze Age, local genetic ancestry persisted, particularly in southern Iberia, despite influences from the North Pontic Steppe and early Mediterranean contacts. The Roman period highlights Idanha-a-Velha as a hub of migration and interaction, with a notably diverse genetic profile. The Early Medieval period is marked by Central European ancestry linked to Suebi/Visigoth migrations, adding to coeval local, African, and Mediterranean influences. The Islamic and Christian Conquest periods show strong genetic continuity in northern Portugal and significant African admixture in the south, with persistent Jewish and Islamic ancestries suggesting enduring influences in the post-Islamic period. Conclusions: This study represents the first attempt to reconstruct the genetic history of Portugal from the analysis of ancient individuals. We reveal dynamic patterns of migration and cultural exchange across millennia, but also the persistence of local ancestries. Our findings integrate genetic information with historical and archaeological data, enhancing our understanding of Iberia's ancient heritage.
{"title":"The genetic history of Portugal over the past 5,000 years","authors":"Xavier Roca-Rada, Roberta Davidson, Matthew P. Williams, Shyamsundar Ravishankar, Evelyn Collen, Christian Haarkotter, Leonard Taufik, Antonio Faustino Carvalho, Vanessa Villalba-Mouco, Daniel R. Cuesta-Aguirre, Catarina Tente, Alvaro M. Monge Calleja, Rebecca Anne MacRoberts, Linda Melo, Gludhug A. Purnomo, Yassine Souilmi, Raymond Tobler, Eugenia Cunha, Sofia Tereso, Vitor M. J. Matos, Teresa Matos Fernandes, Anne-France Mauer, Ana Maria Silva, Pedro C. Carvalho, Bastien Llamas, Joao C. Teixeira","doi":"10.1101/2024.09.12.612544","DOIUrl":"https://doi.org/10.1101/2024.09.12.612544","url":null,"abstract":"Background: Recent ancient DNA studies uncovering large-scale demographic events in Iberia have focused primarily on Spain, with limited reports for Portugal, a country located at the westernmost edge of continental Eurasia. Here, we introduce the largest collection of ancient Portuguese genomic datasets (n = 68) to date, spanning 5,000 years, from the Neolithic to the 19th century.\u0000Results: We found evidence of patrilocality in Neolithic Portugal, with admixture from local hunter-gatherers and Anatolian farmers, and persistence of Upper Paleolithic Magdalenian ancestry. This genetic profile persists into the Chalcolithic, reflecting diverse local hunter-gatherer contributions. During the Bronze Age, local genetic ancestry persisted, particularly in southern Iberia, despite influences from the North Pontic Steppe and early Mediterranean contacts. The Roman period highlights Idanha-a-Velha as a hub of migration and interaction, with a notably diverse genetic profile. The Early Medieval period is marked by Central European ancestry linked to Suebi/Visigoth migrations, adding to coeval local, African, and Mediterranean influences. The Islamic and Christian Conquest periods show strong genetic continuity in northern Portugal and significant African admixture in the south, with persistent Jewish and Islamic ancestries suggesting enduring influences in the post-Islamic period.\u0000Conclusions: This study represents the first attempt to reconstruct the genetic history of Portugal from the analysis of ancient individuals. We reveal dynamic patterns of migration and cultural exchange across millennia, but also the persistence of local ancestries. Our findings integrate genetic information with historical and archaeological data, enhancing our understanding of Iberia's ancient heritage.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1101/2024.09.13.612795
Tamara A Potapova, Paxton Kostos, Sean A McKinney, Matthew Borchers, Jeffrey S Haug, Andrea Guarracino, Steven Solar, Madelaine M Gogol, Graciela Monfort Anez, Leonardo Gomes de Lima, Yan Wang, Kate E. Hall, Sophie Hoffman, Erik Garrison, Adam M. Phillippy, Jennifer L. Gerton
Ribosomal RNA (rRNA) genes exist in multiple copies arranged in tandem arrays known as ribosomal DNA (rDNA). The total number of gene copies is variable, and the mechanisms buffering this copy number variation remain unresolved. We surveyed the number, distribution, and activity of rDNA arrays at the level of individual chromosomes across multiple human and primate genomes. Each individual possessed a unique fingerprint of copy number distribution and activity of rDNA arrays. In some cases, entire rDNA arrays were transcriptionally silent. Silent rDNA arrays showed reduced association with the nucleolus and decreased interchromosomal interactions, indicating that the nucleolar organizer function of rDNA depends on transcriptional activity. Methyl-sequencing of flow-sorted chromosomes, combined with long read sequencing, showed epigenetic modification of rDNA promoter and coding region by DNA methylation. Silent arrays were in a closed chromatin state, as indicated by the accessibility profiles derived from Fiber-seq. Removing DNA methylation restored the transcriptional activity of silent arrays. Array activity status remained stable through the iPS cell re-programming. Family trio analysis demonstrated that the inactive rDNA haplotype can be traced to one of the parental genomes, suggesting that the epigenetic state of rDNA arrays may be heritable. We propose that the dosage of rRNA genes is epigenetically regulated by DNA methylation, and these methylation patterns specify nucleolar organizer function and can propagate transgenerationally.
{"title":"Epigenetic control and inheritance of rDNA arrays","authors":"Tamara A Potapova, Paxton Kostos, Sean A McKinney, Matthew Borchers, Jeffrey S Haug, Andrea Guarracino, Steven Solar, Madelaine M Gogol, Graciela Monfort Anez, Leonardo Gomes de Lima, Yan Wang, Kate E. Hall, Sophie Hoffman, Erik Garrison, Adam M. Phillippy, Jennifer L. Gerton","doi":"10.1101/2024.09.13.612795","DOIUrl":"https://doi.org/10.1101/2024.09.13.612795","url":null,"abstract":"Ribosomal RNA (rRNA) genes exist in multiple copies arranged in tandem arrays known as ribosomal DNA (rDNA). The total number of gene copies is variable, and the mechanisms buffering this copy number variation remain unresolved. We surveyed the number, distribution, and activity of rDNA arrays at the level of individual chromosomes across multiple human and primate genomes. Each individual possessed a unique fingerprint of copy number distribution and activity of rDNA arrays. In some cases, entire rDNA arrays were transcriptionally silent. Silent rDNA arrays showed reduced association with the nucleolus and decreased interchromosomal interactions, indicating that the nucleolar organizer function of rDNA depends on transcriptional activity. Methyl-sequencing of flow-sorted chromosomes, combined with long read sequencing, showed epigenetic modification of rDNA promoter and coding region by DNA methylation. Silent arrays were in a closed chromatin state, as indicated by the accessibility profiles derived from Fiber-seq. Removing DNA methylation restored the transcriptional activity of silent arrays. Array activity status remained stable through the iPS cell re-programming. Family trio analysis demonstrated that the inactive rDNA haplotype can be traced to one of the parental genomes, suggesting that the epigenetic state of rDNA arrays may be heritable. We propose that the dosage of rRNA genes is epigenetically regulated by DNA methylation, and these methylation patterns specify nucleolar organizer function and can propagate transgenerationally.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142260813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}