{"title":"GTQC: Automated Genotyping Array Quality Control and Report.","authors":"Shilin Zhao, Limin Jiang, Hui Yu, Yan Guo","doi":"10.7150/jgen.69860","DOIUrl":null,"url":null,"abstract":"<p><p>Genotyping array is the most economical approach for conducting large-scale genome-wide genetic association studies. Thorough quality control is key to generating high integrity genotyping data and robust results. Quality control of genotyping array is generally a complicated process, as it requires intensive manual labor in implementing the established protocols and curating a comprehensive quality report. There is an urgent need to reduce manual intervention via an automated quality control process. Based on previously established protocols and strategies, we developed an R package GTQC (GenoTyping Quality Control) to automate a majority of the quality control steps for general array genotyping data. GTQC covers a comprehensive spectrum of genotype data quality metrics and produces a detailed HTML report comprising tables and figures. Here, we describe the concepts underpinning GTQC and demonstrate its effectiveness using a real genotyping dataset. R package GTQC streamlines a majority of the quality control steps and produces a detailed HTML report on a plethora of quality control metrics, thus enabling a swift and rigorous data quality inspection prior to downstream GWAS and related analyses. By significantly cutting down on the time on genotyping quality control procedures, GTQC ensures maximum utilization of available resources and minimizes waste and inefficient allocation of manual efforts. GTQC tool can be accessed at https://github.com/slzhao/GTQC.</p>","PeriodicalId":15834,"journal":{"name":"Journal of Genomics","volume":"10 ","pages":"39-44"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8922302/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7150/jgen.69860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Genotyping array is the most economical approach for conducting large-scale genome-wide genetic association studies. Thorough quality control is key to generating high integrity genotyping data and robust results. Quality control of genotyping array is generally a complicated process, as it requires intensive manual labor in implementing the established protocols and curating a comprehensive quality report. There is an urgent need to reduce manual intervention via an automated quality control process. Based on previously established protocols and strategies, we developed an R package GTQC (GenoTyping Quality Control) to automate a majority of the quality control steps for general array genotyping data. GTQC covers a comprehensive spectrum of genotype data quality metrics and produces a detailed HTML report comprising tables and figures. Here, we describe the concepts underpinning GTQC and demonstrate its effectiveness using a real genotyping dataset. R package GTQC streamlines a majority of the quality control steps and produces a detailed HTML report on a plethora of quality control metrics, thus enabling a swift and rigorous data quality inspection prior to downstream GWAS and related analyses. By significantly cutting down on the time on genotyping quality control procedures, GTQC ensures maximum utilization of available resources and minimizes waste and inefficient allocation of manual efforts. GTQC tool can be accessed at https://github.com/slzhao/GTQC.
基因分型阵列是进行大规模全基因组遗传关联研究最经济的方法。彻底的质量控制是生成高完整性基因分型数据和可靠结果的关键。基因分型阵列的质量控制通常是一个复杂的过程,因为它需要大量的人工来执行既定的方案和编制全面的质量报告。目前迫切需要通过自动化质量控制流程来减少人工干预。基于之前建立的协议和策略,我们开发了一个 R 软件包 GTQC(基因分型质量控制),以自动完成一般阵列基因分型数据的大部分质量控制步骤。GTQC 涵盖了基因型数据质量指标的全面范围,并生成一份包含表格和图表的详细 HTML 报告。在此,我们将介绍 GTQC 的基本概念,并使用一个真实的基因分型数据集演示其有效性。R软件包GTQC简化了大部分质量控制步骤,并能生成关于大量质量控制指标的详细HTML报告,因此能在下游GWAS和相关分析之前快速、严格地检查数据质量。通过大幅缩短基因分型质量控制程序的时间,GTQC 可确保最大限度地利用现有资源,最大限度地减少浪费和低效的人工分配。GTQC工具可在https://github.com/slzhao/GTQC。
期刊介绍:
Journal of Genomics publishes papers of high quality in all areas of gene, genetics, genomics, proteomics, metabolomics, DNA/RNA, computational biology, bioinformatics, and other relevant areas of research and application. Articles published by the journal are rigorously peer-reviewed. Types of articles include: Research paper, Short research communication, Review or mini-reviews, Commentary, Database, Software.