Yash Patel, Chenghao Zhu, Takafumi N Yamaguchi, Nicholas K Wang, Nicholas Wiltsie, Nicole Zeltser, Alfredo E Gonzalez, Helena K Winata, Yu Pan, Mohammed Faizal Eeman Mootor, Timothy Sanders, Sorel T Fitz-Gibbon, Cyriac Kandoth, Julie Livingstone, Lydia Y Liu, Benjamin Carlin, Aaron Holmes, Jieun Oh, John Sahrmann, Shu Tao, Stefan Eng, Rupert Hugh-White, Kiarod Pashminehazar, Andrew Park, Arpi Beshlikyan, Madison Jordan, Selina Wu, Mao Tian, Jaron Arbet, Beth Neilsen, Roni Haas, Yuan Zhe Bugh, Gina Kim, Joseph Salmingo, Wenshu Zhang, Aakarsh Anand, Edward Hwang, Anna Neiman-Golden, Philippa Steinberg, Wenyan Zhao, Prateek Anand, Raag Agrawal, Brandon L Tsai, Paul C Boutros
{"title":"Metapipeline-DNA: A Comprehensive Germline & Somatic Genomics Nextflow Pipeline.","authors":"Yash Patel, Chenghao Zhu, Takafumi N Yamaguchi, Nicholas K Wang, Nicholas Wiltsie, Nicole Zeltser, Alfredo E Gonzalez, Helena K Winata, Yu Pan, Mohammed Faizal Eeman Mootor, Timothy Sanders, Sorel T Fitz-Gibbon, Cyriac Kandoth, Julie Livingstone, Lydia Y Liu, Benjamin Carlin, Aaron Holmes, Jieun Oh, John Sahrmann, Shu Tao, Stefan Eng, Rupert Hugh-White, Kiarod Pashminehazar, Andrew Park, Arpi Beshlikyan, Madison Jordan, Selina Wu, Mao Tian, Jaron Arbet, Beth Neilsen, Roni Haas, Yuan Zhe Bugh, Gina Kim, Joseph Salmingo, Wenshu Zhang, Aakarsh Anand, Edward Hwang, Anna Neiman-Golden, Philippa Steinberg, Wenyan Zhao, Prateek Anand, Raag Agrawal, Brandon L Tsai, Paul C Boutros","doi":"10.1101/2024.09.04.611267","DOIUrl":null,"url":null,"abstract":"<p><p>The price, quality and throughout of DNA sequencing continue to improve. Algorithmic innovations have allowed inference of a growing range of features from DNA sequencing data, quantifying nuclear, mitochondrial and evolutionary aspects of both germline and somatic genomes. To automate analyses of the full range of genomic characteristics, we created an extensible Nextflow meta-pipeline called metapipeline-DNA. Metapipeline-DNA analyzes targeted and whole-genome sequencing data from raw reads through pre-processing, feature detection by multiple algorithms, quality-control and data-visualization. Each step can be run independently and is supported robust software engineering including automated failure-recovery, robust testing and consistent verifications of inputs, outputs and parameters. Metapipeline-DNA is cloud-compatible and highly configurable, with options to subset and optimize each analysis. Metapipeline-DNA facilitates high-scale, comprehensive analysis of DNA sequencing data.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398472/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.611267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The price, quality and throughout of DNA sequencing continue to improve. Algorithmic innovations have allowed inference of a growing range of features from DNA sequencing data, quantifying nuclear, mitochondrial and evolutionary aspects of both germline and somatic genomes. To automate analyses of the full range of genomic characteristics, we created an extensible Nextflow meta-pipeline called metapipeline-DNA. Metapipeline-DNA analyzes targeted and whole-genome sequencing data from raw reads through pre-processing, feature detection by multiple algorithms, quality-control and data-visualization. Each step can be run independently and is supported robust software engineering including automated failure-recovery, robust testing and consistent verifications of inputs, outputs and parameters. Metapipeline-DNA is cloud-compatible and highly configurable, with options to subset and optimize each analysis. Metapipeline-DNA facilitates high-scale, comprehensive analysis of DNA sequencing data.