A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies.

IF 2 Q3 BIOCHEMICAL RESEARCH METHODS Methods and Protocols Pub Date : 2024-07-27 DOI:10.3390/mps7040057

Muhammad Shoaib Akhtar, Shoji Kawamura

{"title":"A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies.","authors":"Muhammad Shoaib Akhtar, Shoji Kawamura","doi":"10.3390/mps7040057","DOIUrl":null,"url":null,"abstract":"With the advent of \"next-generation\" sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima's D, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima's D analysis using this approach on intact and pseudogenes, as well as non-coding regions.","PeriodicalId":18715,"journal":{"name":"Methods and Protocols","volume":"7 4","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11357298/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/mps7040057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

With the advent of "next-generation" sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima's D, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima's D analysis using this approach on intact and pseudogenes, as well as non-coding regions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从公共全基因组数据库中提取特定基因组区域并为群体遗传研究修改分析区间长度的规程。

随着 "下一代 "测序技术的出现和测序成本的不断降低，出现了越来越多的基因组数据，如全基因组、全外显子组和靶向测序数据。这些应用不仅在千人基因组计划和英国生物库等大型测序项目中很受欢迎，在个人研究人员中也很流行。对于全基因组水平的群体数据，人们越来越需要进行进化遗传分析，如 dN/dS 比值和田岛 D。这些分析通常是在全基因组统一的自定义粒度下进行的。然而，这些分析需要将基因组区域细分为功能单元，如蛋白质编码区、内含子和非翻译区，因此为大规模数据计算这些遗传指标仍然具有挑战性。在最近的一项研究中，我们成功地设计出一种方法来解决这一问题。这种方法需要一个包含群体数据、参考基因组、BED 文件中的目标区域和分析中要包含的样本列表的多样本 VCF 文件。在新的 VCF 文件中提取目标区域后，就可以进行目标种群遗传分析了。我们使用这种方法对完整基因、假基因以及非编码区进行了田岛 D 分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊