Yi Ling Tam, Sarah Cameron, Andrew Preston, Lauren Cowley
{"title":"GWarrange:用于检测与表型相关的基因组重排事件的全基因组关联研究前后管道。","authors":"Yi Ling Tam, Sarah Cameron, Andrew Preston, Lauren Cowley","doi":"10.1099/mgen.0.001268","DOIUrl":null,"url":null,"abstract":"<p><p>The use of <i>k</i>-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret <i>k</i>-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present <i>GWarrange</i>, a pre- and post-bGWAS processing methodology that leverages the unique properties of <i>k</i>-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short <i>k</i>-mers. Then, locations of flanking regions in significant <i>k</i>-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (<i>Bordetella pertussis</i> and <i>Enterococcus faecium</i>) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. <i>GWarrange</i> is available at https://github.com/DorothyTamYiLing/GWarrange.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"10 7","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316554/pdf/","citationCount":"0","resultStr":"{\"title\":\"<i>GWarrange</i>: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events.\",\"authors\":\"Yi Ling Tam, Sarah Cameron, Andrew Preston, Lauren Cowley\",\"doi\":\"10.1099/mgen.0.001268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The use of <i>k</i>-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret <i>k</i>-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present <i>GWarrange</i>, a pre- and post-bGWAS processing methodology that leverages the unique properties of <i>k</i>-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short <i>k</i>-mers. Then, locations of flanking regions in significant <i>k</i>-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (<i>Bordetella pertussis</i> and <i>Enterococcus faecium</i>) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. <i>GWarrange</i> is available at https://github.com/DorothyTamYiLing/GWarrange.</p>\",\"PeriodicalId\":18487,\"journal\":{\"name\":\"Microbial Genomics\",\"volume\":\"10 7\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316554/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microbial Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1099/mgen.0.001268\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001268","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events.
The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrange.
期刊介绍:
Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.