Control for stochastic sampling variation and qualitative sequencing error in next generation sequencing

Q1 Biochemistry, Genetics and Molecular Biology Biomolecular Detection and Quantification Pub Date : 2015-09-01 DOI:10.1016/j.bdq.2015.08.003

Thomas Blomquist , Erin L. Crawford , Jiyoun Yeo , Xiaolu Zhang , James C. Willey

{"title":"Control for stochastic sampling variation and qualitative sequencing error in next generation sequencing","authors":"Thomas Blomquist , Erin L. Crawford , Jiyoun Yeo , Xiaolu Zhang , James C. Willey","doi":"10.1016/j.bdq.2015.08.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Clinical implementation of Next-Generation Sequencing (NGS) is challenged by poor control for stochastic sampling, library preparation biases and qualitative sequencing error. To address these challenges we developed and tested two hypotheses.</p></div><div><h3>Methods</h3><p>Hypothesis 1: Analytical variation in quantification is predicted by stochastic sampling effects at input of (a) amplifiable nucleic acid target molecules into the library preparation, (b) amplicons from library into sequencer, or (c) both. We derived equations using Monte Carlo simulation to predict assay coefficient of variation (CV) based on these three working models and tested them against NGS data from specimens with well characterized molecule inputs and sequence counts prepared using competitive multiplex-PCR amplicon-based NGS library preparation method comprising synthetic internal standards (IS). Hypothesis 2: Frequencies of technically-derived qualitative sequencing errors (i.e., base substitution, insertion and deletion) observed at each base position in each target native template (NT) are concordant with those observed in respective competitive synthetic IS present in the same reaction. We measured error frequencies at each base position within amplicons from each of 30 target NT, then tested whether they correspond to those within the 30 respective IS.</p></div><div><h3>Results</h3><p>For hypothesis 1, the Monte Carlo model derived from both sampling events best predicted CV and explained 74% of observed assay variance. For hypothesis 2, observed frequency and type of sequence variation at each base position within each IS was concordant with that observed in respective NTs (<em>R</em><sup>2</sup>  <!-->0.93).</p></div><div><h3>Conclusion</h3><p>In targeted NGS, synthetic competitive IS control for stochastic sampling at input of both target into library preparation and of target library product into sequencer, and control for qualitative errors generated during library preparation and sequencing. These controls enable accurate clinical diagnostic reporting of confidence limits and limit of detection for copy number measurement, and of frequency for each actionable mutation.</p></div>","PeriodicalId":38073,"journal":{"name":"Biomolecular Detection and Quantification","volume":"5 ","pages":"Pages 30-37"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.bdq.2015.08.003","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecular Detection and Quantification","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221475351530005X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 10

Abstract

Background

Clinical implementation of Next-Generation Sequencing (NGS) is challenged by poor control for stochastic sampling, library preparation biases and qualitative sequencing error. To address these challenges we developed and tested two hypotheses.

Methods

Hypothesis 1: Analytical variation in quantification is predicted by stochastic sampling effects at input of (a) amplifiable nucleic acid target molecules into the library preparation, (b) amplicons from library into sequencer, or (c) both. We derived equations using Monte Carlo simulation to predict assay coefficient of variation (CV) based on these three working models and tested them against NGS data from specimens with well characterized molecule inputs and sequence counts prepared using competitive multiplex-PCR amplicon-based NGS library preparation method comprising synthetic internal standards (IS). Hypothesis 2: Frequencies of technically-derived qualitative sequencing errors (i.e., base substitution, insertion and deletion) observed at each base position in each target native template (NT) are concordant with those observed in respective competitive synthetic IS present in the same reaction. We measured error frequencies at each base position within amplicons from each of 30 target NT, then tested whether they correspond to those within the 30 respective IS.

Results

For hypothesis 1, the Monte Carlo model derived from both sampling events best predicted CV and explained 74% of observed assay variance. For hypothesis 2, observed frequency and type of sequence variation at each base position within each IS was concordant with that observed in respective NTs (R² = 0.93).

Conclusion

In targeted NGS, synthetic competitive IS control for stochastic sampling at input of both target into library preparation and of target library product into sequencer, and control for qualitative errors generated during library preparation and sequencing. These controls enable accurate clinical diagnostic reporting of confidence limits and limit of detection for copy number measurement, and of frequency for each actionable mutation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

下一代测序中随机抽样变异和定性测序误差的控制

新一代测序(NGS)的临床实施面临随机抽样控制不佳、文库制备偏差和定性测序误差的挑战。为了应对这些挑战，我们提出并测试了两个假设。方法假设1:定量分析的变化是通过(a)可扩增的核酸靶分子输入文库制备，(b)文库扩增子输入测序器，或(c)两者同时输入时的随机抽样效应来预测的。基于这三种工作模型，我们通过蒙特卡罗模拟推导出方程来预测分析变异系数(CV)，并与具有良好特征的分子输入和序列计数的样品的NGS数据进行了测试，这些样品采用竞争性多重pcr扩增子为基础的NGS文库制备方法，包括合成内标(IS)。假设2:在每个靶天然模板(NT)的每个碱基位置观察到的技术衍生的定性测序错误(即碱基替换、插入和删除)的频率与在同一反应中存在的各自竞争性合成IS中观察到的频率一致。我们测量了30个目标NT中每个扩增子的每个碱基位置的错误频率，然后测试它们是否与30个各自的IS中的错误频率相对应。结果对于假设1，蒙特卡罗模型从两个抽样事件中得到最好的CV预测和解释74%观察到的分析方差。对于假设2，在每个IS内的每个碱基位置观察到的序列变异频率和类型与各自nt中观察到的序列变异频率和类型一致(R2 = 0.93)。结论在靶向NGS中，对目标文库制备和目标文库产品输入测序仪的随机抽样进行了综合竞争性IS控制，并对文库制备和测序过程中产生的定性误差进行了控制。这些控制使准确的临床诊断报告的置信限和检测限度的拷贝数测量，以及频率的每一个可操作的突变。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊