Purpose
Newborn screening identifies rare diseases that result from the recessive inheritance of pathogenic variants in both copies of a gene. Long-read genome sequencing (LRS) is used for identifying and phasing genomic variants, but further efforts are needed to develop LRS for applications using low-yield DNA samples.
Methods
In this study, genomic DNA with high molecular weight was obtained from 2 cystic fibrosis patients, comprising a whole-blood sample (CF1) and a newborn dried blood spot sample (CF2). Library preparation and genome sequencing (30-fold coverage) were performed using 20 ng of DNA input on both the PacBio Revio system and the Illumina NovaSeq short-read sequencer. Single-nucleotide variants, small indels, and structural variants were identified for each data set.
Results
Our results indicated that the genotype concordance between long- and short-read genome sequencing data was higher for single-nucleotide variants than for small indels. Both technologies accurately identified known pathogenic variants in the CFTR gene (CF1: p.(Met607_Gln634del), p.(Phe508del); CF2: p.(Phe508del), p.(Ala455Glu)) with complete concordance for the polymorphic poly-TG and consecutive poly-T tracts. Using PacBio read-based haplotype phasing, we successfully determined the allelic phase and identified compound heterozygosity of pathogenic variants at genomic distances of 32.4 kb (CF1) and 10.8 kb (CF2).
Conclusion
Haplotype phasing of rare pathogenic variants from minimal DNA input is achieved through LRS. This approach has the potential to eliminate the need for parental testing, thereby shortening the time to diagnosis in genetic disease screening.