A 28nm Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing

IF 4.9 IEEE transactions on biomedical circuits and systems Pub Date : 2025-03-27 DOI:10.1109/TBCAS.2025.3555579

Yi-Chung Wu;Yen-Lung Chen;Chung-Hsuan Yang;Chao-Hsi Lee;Wen-Ching Chen;Liang-Yi Lin;Nian-Shyang Chang;Chun-Pin Lin;Chi-Shi Chen;Jui-Hung Hung;Chia-Hsiang Yang

{"title":"A 28nm Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing","authors":"Yi-Chung Wu;Yen-Lung Chen;Chung-Hsuan Yang;Chao-Hsi Lee;Wen-Ching Chen;Liang-Yi Lin;Nian-Shyang Chang;Chun-Pin Lin;Chi-Shi Chen;Jui-Hung Hung;Chia-Hsiang Yang","doi":"10.1109/TBCAS.2025.3555579","DOIUrl":null,"url":null,"abstract":"This paper presents the first end-to-end next-generation sequencing (NGS) data analysis accelerator for short-read mapping, haplotype calling, variant calling, and genotyping. It supports both single-end and paired-end short-reads (or reads) and uses the FM-index, a compact index data structure, for exact-match in short-read mapping. For inexact match part of short-read mapping, a dynamic programming array is proposed to determine the mapping results. To reduce the workload of short-read mapping, a rapid similarity calculation is designed. A rescue technique is also adopted to increase the overall sensitivity. In haplotype calling, a parallel <inline-formula><tex-math>$k$</tex-math></inline-formula>-mer processing engine can construct the <italic>de Bruijn</i> graph and assemble the haplotypes. The variant calling step determines variants between a subject and a reference genome sequence with a variant discovery engine. Lastly, genotype likelihood is computed in parallel by a genotype likelihood computing engine, which outputs genotypes of all discovered variants and corresponding Phred-scaled likelihood (PL) values. This work completes end-to-end data analysis for the 50<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> PrecisionFDA dataset in an average of 28.2 minutes. It achieves a 3-to-59<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> higher throughput than the existing solutions with higher precision (99.79%) and sensitivity (99.03%). The chip also achieves a 935<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> higher energy efficiency than the Illumina DRAGEN FPGA acceleration system.","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"19 6","pages":"1105-1119"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biomedical circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10944550/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents the first end-to-end next-generation sequencing (NGS) data analysis accelerator for short-read mapping, haplotype calling, variant calling, and genotyping. It supports both single-end and paired-end short-reads (or reads) and uses the FM-index, a compact index data structure, for exact-match in short-read mapping. For inexact match part of short-read mapping, a dynamic programming array is proposed to determine the mapping results. To reduce the workload of short-read mapping, a rapid similarity calculation is designed. A rescue technique is also adopted to increase the overall sensitivity. In haplotype calling, a parallel

$k$

-mer processing engine can construct the de Bruijn graph and assemble the haplotypes. The variant calling step determines variants between a subject and a reference genome sequence with a variant discovery engine. Lastly, genotype likelihood is computed in parallel by a genotype likelihood computing engine, which outputs genotypes of all discovered variants and corresponding Phred-scaled likelihood (PL) values. This work completes end-to-end data analysis for the 50

$\boldsymbol{\times}$

PrecisionFDA dataset in an average of 28.2 minutes. It achieves a 3-to-59

$\boldsymbol{\times}$

higher throughput than the existing solutions with higher precision (99.79%) and sensitivity (99.03%). The chip also achieves a 935

$\boldsymbol{\times}$

higher energy efficiency than the Illumina DRAGEN FPGA acceleration system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一个28纳米完全集成的端到端基因组分析加速器，用于下一代测序。

本文介绍了首个端到端下一代测序（NGS）数据分析加速器，用于短读数映射、单体型调用、变异调用和基因分型。它支持单端和成对端短线程（或读数），并使用紧凑型索引数据结构 FM-index 进行短线程映射中的精确匹配。对于短读映射中的非精确匹配部分，提出了一种动态编程阵列来确定映射结果。为减少短读映射的工作量，设计了一种快速相似性计算方法。此外，还采用了一种挽救技术来提高整体灵敏度。在单倍型调用中，并行 k-mer 处理引擎可以构建 de Bruijn 图并组装单倍型。变异调用步骤是利用变异发现引擎确定受试者与参考基因组序列之间的变异。最后，通过基因型似然计算引擎并行计算基因型似然，输出所有已发现变体的基因型和相应的 Phred-scaled似然 (PL) 值。这项工作在平均 28.2 分钟内完成了 50× PrecisionFDA 数据集的端到端数据分析。与现有解决方案相比，它的吞吐量提高了 3-59 倍，精确度（99.79%）和灵敏度（99.03%）也更高。该芯片的能效也比 Illumina DRAGEN FPGA 加速系统高出 935 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on biomedical circuits and systems

自引率

0.00%

发文量