Darrell O Ricke, Derek Ng, Adam Michaleas, Philip Fremont-Smith
{"title":"高性能计算环境中的Omics分析和质量控制管道。","authors":"Darrell O Ricke, Derek Ng, Adam Michaleas, Philip Fremont-Smith","doi":"10.1089/omi.2023.0078","DOIUrl":null,"url":null,"abstract":"<p><p>Data quality is often an overlooked feature in the analysis of omics data. This is particularly relevant in studies of chemical and pathogen exposures that can modify an individual's epigenome and transcriptome with persistence over time. Portable, quality control (QC) pipelines for multiple different omics datasets are therefore needed. To meet these goals, portable quality assurance (QA) metrics, metric acceptability criterion, and pipelines to compute these metrics were developed and consolidated into one framework for 12 different omics assays. Performance of these QA metrics and pipelines were evaluated on human data generated by the Defense Advanced Research Projects Agency (DARPA) Epigenetic CHaracterization and Observation (ECHO) program. Twelve analytical pipelines were developed leveraging standard tools when possible. These QC pipelines were containerized using Singularity to ensure portability and scalability. Datasets for these 12 omics assays were analyzed and results were summarized. The quality thresholds and metrics used were described. We found that these pipelines enabled early identification of lower quality datasets, datasets with insufficient reads for additional sequencing, and experimental protocols needing refinements. These omics data analysis and QC pipelines are available as open-source resources as reported and discussed in this article for the omics and life sciences communities.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Omics Analysis and Quality Control Pipelines in a High-Performance Computing Environment.\",\"authors\":\"Darrell O Ricke, Derek Ng, Adam Michaleas, Philip Fremont-Smith\",\"doi\":\"10.1089/omi.2023.0078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Data quality is often an overlooked feature in the analysis of omics data. This is particularly relevant in studies of chemical and pathogen exposures that can modify an individual's epigenome and transcriptome with persistence over time. Portable, quality control (QC) pipelines for multiple different omics datasets are therefore needed. To meet these goals, portable quality assurance (QA) metrics, metric acceptability criterion, and pipelines to compute these metrics were developed and consolidated into one framework for 12 different omics assays. Performance of these QA metrics and pipelines were evaluated on human data generated by the Defense Advanced Research Projects Agency (DARPA) Epigenetic CHaracterization and Observation (ECHO) program. Twelve analytical pipelines were developed leveraging standard tools when possible. These QC pipelines were containerized using Singularity to ensure portability and scalability. Datasets for these 12 omics assays were analyzed and results were summarized. The quality thresholds and metrics used were described. We found that these pipelines enabled early identification of lower quality datasets, datasets with insufficient reads for additional sequencing, and experimental protocols needing refinements. These omics data analysis and QC pipelines are available as open-source resources as reported and discussed in this article for the omics and life sciences communities.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/omi.2023.0078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/11/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/omi.2023.0078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Omics Analysis and Quality Control Pipelines in a High-Performance Computing Environment.
Data quality is often an overlooked feature in the analysis of omics data. This is particularly relevant in studies of chemical and pathogen exposures that can modify an individual's epigenome and transcriptome with persistence over time. Portable, quality control (QC) pipelines for multiple different omics datasets are therefore needed. To meet these goals, portable quality assurance (QA) metrics, metric acceptability criterion, and pipelines to compute these metrics were developed and consolidated into one framework for 12 different omics assays. Performance of these QA metrics and pipelines were evaluated on human data generated by the Defense Advanced Research Projects Agency (DARPA) Epigenetic CHaracterization and Observation (ECHO) program. Twelve analytical pipelines were developed leveraging standard tools when possible. These QC pipelines were containerized using Singularity to ensure portability and scalability. Datasets for these 12 omics assays were analyzed and results were summarized. The quality thresholds and metrics used were described. We found that these pipelines enabled early identification of lower quality datasets, datasets with insufficient reads for additional sequencing, and experimental protocols needing refinements. These omics data analysis and QC pipelines are available as open-source resources as reported and discussed in this article for the omics and life sciences communities.