{"title":"GCOC: A Genome Classifier-On-Chip Based on Similarity Search Content Addressable Memory","authors":"Yuval Harary;Paz Snapir;Shir Siman Tov;Chen Kruphman;Eyal Rechef;Zuher Jahshan;Esteban Garzón;Leonid Yavits","doi":"10.1109/TBCAS.2024.3449788","DOIUrl":null,"url":null,"abstract":"GCOC is a genome classification system-on-chip (SoC) that classifies genomes by <inline-formula><tex-math>$k$</tex-math></inline-formula>-mer matching, an approach that divides a DNA query sequence into a set of short DNA fragments of size <italic>k</i>, which are searched in a reference genome database, with the underlying assumption that sequenced DNA reads of the same organism (or its close variants) share most of such <inline-formula><tex-math>$k$</tex-math></inline-formula>-mers. At the core of GCOC is a similarity, or approximate search-capable Content Addressable Memory (SAS-CAM), which in addition to exact match, also supports approximate, or Hamming distance tolerant search. Classification operation is controlled by an embedded RISC-V processor. GCOC classification platform was designed and manufactured in a commercial 65nm process. We conduct a thorough analysis of GCOC classification efficiency as well as its performance, silicon area, and power consumption using silicon measurements. GCOC classifies 769.2K short DNA reads/sec. The silicon area of GCOC SoC is 3.12 <inline-formula><tex-math>$\\mathrm{mm}^{2}$</tex-math></inline-formula> and its power consumption is 1.27 <inline-formula><tex-math>$\\mathrm{mW}$</tex-math></inline-formula>. We envision GCOC deployed as a field (for example at points of care) portable classifier where the classification is required to be real-time, easy to operate and energy efficient.","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"19 3","pages":"484-495"},"PeriodicalIF":4.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biomedical circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10654290/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GCOC is a genome classification system-on-chip (SoC) that classifies genomes by $k$-mer matching, an approach that divides a DNA query sequence into a set of short DNA fragments of size k, which are searched in a reference genome database, with the underlying assumption that sequenced DNA reads of the same organism (or its close variants) share most of such $k$-mers. At the core of GCOC is a similarity, or approximate search-capable Content Addressable Memory (SAS-CAM), which in addition to exact match, also supports approximate, or Hamming distance tolerant search. Classification operation is controlled by an embedded RISC-V processor. GCOC classification platform was designed and manufactured in a commercial 65nm process. We conduct a thorough analysis of GCOC classification efficiency as well as its performance, silicon area, and power consumption using silicon measurements. GCOC classifies 769.2K short DNA reads/sec. The silicon area of GCOC SoC is 3.12 $\mathrm{mm}^{2}$ and its power consumption is 1.27 $\mathrm{mW}$. We envision GCOC deployed as a field (for example at points of care) portable classifier where the classification is required to be real-time, easy to operate and energy efficient.