Background: The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.
Results: We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2NvS translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.
Conclusions: The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.
{"title":"Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing.","authors":"Jianxia Niu, Wenxi Wang, Zihao Wang, Zhe Chen, Xiaoyu Zhang, Zhen Qin, Lingfeng Miao, Zhengzhao Yang, Chaojie Xie, Mingming Xin, Huiru Peng, Yingyin Yao, Jie Liu, Zhongfu Ni, Qixin Sun, Weilong Guo","doi":"10.1186/s13059-024-03315-6","DOIUrl":"10.1186/s13059-024-03315-6","url":null,"abstract":"<p><strong>Background: </strong>The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources.</p><p><strong>Results: </strong>We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2N<sup>v</sup>S translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties.</p><p><strong>Conclusions: </strong>The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"171"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03312-9
Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer
Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.
{"title":"RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level.","authors":"Kevin Lamkiewicz, Lisa-Marie Barf, Konrad Sachse, Martin Hölzer","doi":"10.1186/s13059-024-03312-9","DOIUrl":"10.1186/s13059-024-03312-9","url":null,"abstract":"<p><p>Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"170"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13059-024-03314-7
Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth
Background: Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.
Results: AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.
Conclusion: We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.
{"title":"Benchmarking computational variant effect predictors by their ability to infer human traits.","authors":"Daniel R Tabet, Da Kuang, Megan C Lancaster, Roujia Li, Karen Liu, Jochen Weile, Atina G Coté, Yingzhou Wu, Robert A Hegele, Dan M Roden, Frederick P Roth","doi":"10.1186/s13059-024-03314-7","DOIUrl":"10.1186/s13059-024-03314-7","url":null,"abstract":"<p><strong>Background: </strong>Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts.</p><p><strong>Results: </strong>AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation.</p><p><strong>Conclusion: </strong>We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"172"},"PeriodicalIF":12.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03175-0
Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov
Background: Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.
Results: Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.
Conclusions: Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.
{"title":"BORIS/CTCFL epigenetically reprograms clustered CTCF binding sites into alternative transcriptional start sites.","authors":"Elena M Pugacheva, Dharmendra Nath Bhatt, Samuel Rivero-Hinojosa, Md Tajmul, Liron Fedida, Emma Price, Yon Ji, Dmitri Loukinov, Alexander V Strunnikov, Bing Ren, Victor V Lobanenkov","doi":"10.1186/s13059-024-03175-0","DOIUrl":"10.1186/s13059-024-03175-0","url":null,"abstract":"<p><strong>Background: </strong>Pervasive usage of alternative promoters leads to the deregulation of gene expression in carcinogenesis and may drive the emergence of new genes in spermatogenesis. However, little is known regarding the mechanisms underpinning the activation of alternative promoters.</p><p><strong>Results: </strong>Here we describe how alternative cancer-testis-specific transcription is activated. We show that intergenic and intronic CTCF binding sites, which are transcriptionally inert in normal somatic cells, could be epigenetically reprogrammed into active de novo promoters in germ and cancer cells. BORIS/CTCFL, the testis-specific paralog of the ubiquitously expressed CTCF, triggers the epigenetic reprogramming of CTCF sites into units of active transcription. BORIS binding initiates the recruitment of the chromatin remodeling factor, SRCAP, followed by the replacement of H2A histone with H2A.Z, resulting in a more relaxed chromatin state in the nucleosomes flanking the CTCF binding sites. The relaxation of chromatin around CTCF binding sites facilitates the recruitment of multiple additional transcription factors, thereby activating transcription from a given binding site. We demonstrate that the epigenetically reprogrammed CTCF binding sites can drive the expression of cancer-testis genes, long noncoding RNAs, retro-pseudogenes, and dormant transposable elements.</p><p><strong>Conclusions: </strong>Thus, BORIS functions as a transcription factor that epigenetically reprograms clustered CTCF binding sites into transcriptional start sites, promoting transcription from alternative promoters in both germ cells and cancer cells.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"40"},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03171-4
Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
{"title":"RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.","authors":"Sarah Fazal, Matt C Danzi, Isaac Xu, Shilpa Nadimpalli Kobren, Shamil Sunyaev, Chloe Reuter, Shruti Marwaha, Matthew Wheeler, Egor Dolzhenko, Francesca Lucas, Stefan Wuchty, Mustafa Tekin, Stephan Züchner, Vanessa Aguiar-Pulido","doi":"10.1186/s13059-024-03171-4","DOIUrl":"10.1186/s13059-024-03171-4","url":null,"abstract":"<p><p>Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"39"},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1186/s13059-024-03170-5
Alice Antonello, Riccardo Bergamin, Nicola Calonaci, Jacob Househam, Salvatore Milite, Marc J Williams, Fabio Anselmi, Alberto d'Onofrio, Vasavi Sundaram, Alona Sosinsky, William C H Cross, Giulio Caravagna
Copy number alterations (CNAs) are among the most important genetic events in cancer, but their detection from sequencing data is challenging because of unknown sample purity, tumor ploidy, and general intra-tumor heterogeneity. Here, we present CNAqc, an evolution-inspired method to perform the computational validation of clonal and subclonal CNAs detected from bulk DNA sequencing. CNAqc is validated using single-cell data and simulations, is applied to over 4000 TCGA and PCAWG samples, and is incorporated into the validation process for the clinically accredited bioinformatics pipeline at Genomics England. CNAqc is designed to support automated quality control procedures for tumor somatic data validation.
{"title":"Computational validation of clonal and subclonal copy number alterations from bulk tumor sequencing using CNAqc.","authors":"Alice Antonello, Riccardo Bergamin, Nicola Calonaci, Jacob Househam, Salvatore Milite, Marc J Williams, Fabio Anselmi, Alberto d'Onofrio, Vasavi Sundaram, Alona Sosinsky, William C H Cross, Giulio Caravagna","doi":"10.1186/s13059-024-03170-5","DOIUrl":"10.1186/s13059-024-03170-5","url":null,"abstract":"<p><p>Copy number alterations (CNAs) are among the most important genetic events in cancer, but their detection from sequencing data is challenging because of unknown sample purity, tumor ploidy, and general intra-tumor heterogeneity. Here, we present CNAqc, an evolution-inspired method to perform the computational validation of clonal and subclonal CNAs detected from bulk DNA sequencing. CNAqc is validated using single-cell data and simulations, is applied to over 4000 TCGA and PCAWG samples, and is incorporated into the validation process for the clinically accredited bioinformatics pipeline at Genomics England. CNAqc is designed to support automated quality control procedures for tumor somatic data validation.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"25 1","pages":"38"},"PeriodicalIF":12.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10832148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dissemination of circulating tumor cells at night: role of sleep or circadian rhythm?","authors":"Yves Dauvilliers, Frédéric Thomas, Catherine Alix-Panabières","doi":"10.1186/s13059-022-02791-y","DOIUrl":"https://doi.org/10.1186/s13059-022-02791-y","url":null,"abstract":"","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"23 1","pages":"214"},"PeriodicalIF":12.3,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33539845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-12DOI: 10.1186/s13059-022-02782-z
Seonghyun Lee, Hyunji Lee, Gayoung Baek, Eunji Namgung, Joo Min Park, Sanghun Kim, Seongho Hong, Jin-Soo Kim
We present two methods for enhancing the efficiency of mitochondrial DNA (mtDNA) editing in mice with DddA-derived cytosine base editors (DdCBEs). First, we fused DdCBEs to a nuclear export signal (DdCBE-NES) to avoid off-target C-to-T conversions in the nuclear genome and improve editing efficiency in mtDNA. Second, mtDNA-targeted TALENs (mitoTALENs) are co-injected into mouse embryos to cleave unedited mtDNA. We generated a mouse model with the m.G12918A mutation in the MT-ND5 gene, associated with mitochondrial genetic disorders in humans. The mutant mice show hunched appearances, damaged mitochondria in kidney and brown adipose tissues, and hippocampal atrophy, resulting in premature death.
{"title":"Enhanced mitochondrial DNA editing in mice using nuclear-exported TALE-linked deaminases and nucleases.","authors":"Seonghyun Lee, Hyunji Lee, Gayoung Baek, Eunji Namgung, Joo Min Park, Sanghun Kim, Seongho Hong, Jin-Soo Kim","doi":"10.1186/s13059-022-02782-z","DOIUrl":"10.1186/s13059-022-02782-z","url":null,"abstract":"<p><p>We present two methods for enhancing the efficiency of mitochondrial DNA (mtDNA) editing in mice with DddA-derived cytosine base editors (DdCBEs). First, we fused DdCBEs to a nuclear export signal (DdCBE-NES) to avoid off-target C-to-T conversions in the nuclear genome and improve editing efficiency in mtDNA. Second, mtDNA-targeted TALENs (mitoTALENs) are co-injected into mouse embryos to cleave unedited mtDNA. We generated a mouse model with the m.G12918A mutation in the MT-ND5 gene, associated with mitochondrial genetic disorders in humans. The mutant mice show hunched appearances, damaged mitochondria in kidney and brown adipose tissues, and hippocampal atrophy, resulting in premature death.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"23 1","pages":"211"},"PeriodicalIF":12.3,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9554978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33503033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-04DOI: 10.1186/s13059-022-02771-2
Ivar Grytten, Knut Dagestad Rand, Geir Kjetil Sandve
Genotyping is a core application of high-throughput sequencing. We present KAGE, a genotyper for SNPs and short indels that is inspired by recent developments within graph-based genome representations and alignment-free methods. KAGE uses a pan-genome representation of the population to efficiently and accurately predict genotypes. Two novel ideas improve both the speed and accuracy: a Bayesian model incorporates genotypes from thousands of individuals to improve prediction accuracy, and a computationally efficient method leverages correlation between variants. We show that the accuracy of KAGE is at par with the best existing alignment-free genotypers, while being an order of magnitude faster.
{"title":"KAGE: fast alignment-free graph-based genotyping of SNPs and short indels.","authors":"Ivar Grytten, Knut Dagestad Rand, Geir Kjetil Sandve","doi":"10.1186/s13059-022-02771-2","DOIUrl":"https://doi.org/10.1186/s13059-022-02771-2","url":null,"abstract":"<p><p>Genotyping is a core application of high-throughput sequencing. We present KAGE, a genotyper for SNPs and short indels that is inspired by recent developments within graph-based genome representations and alignment-free methods. KAGE uses a pan-genome representation of the population to efficiently and accurately predict genotypes. Two novel ideas improve both the speed and accuracy: a Bayesian model incorporates genotypes from thousands of individuals to improve prediction accuracy, and a computationally efficient method leverages correlation between variants. We show that the accuracy of KAGE is at par with the best existing alignment-free genotypers, while being an order of magnitude faster.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"23 1","pages":"209"},"PeriodicalIF":12.3,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9531401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33489028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}