Hyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim
{"title":"ONNXim: A Fast, Cycle-level Multi-core NPU Simulator","authors":"Hyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim","doi":"arxiv-2406.08051","DOIUrl":null,"url":null,"abstract":"As DNNs are widely adopted in various application domains while demanding\nincreasingly higher compute and memory requirements, designing efficient and\nperformant NPUs (Neural Processing Units) is becoming more important. However,\nexisting architectural NPU simulators lack support for high-speed simulation,\nmulti-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or\ndifferent deep learning frameworks. To address these limitations, this work\nproposes ONNXim, a fast cycle-level simulator for multi-core NPUs in DNN\nserving systems. It takes DNN models represented in the ONNX graph format\ngenerated from various deep learning frameworks for ease of simulation. In\naddition, based on the observation that typical NPU cores process tensor tiles\nfrom on-chip scratchpad memory with deterministic compute latency, we forgo a\ndetailed modeling for the computation while still preserving simulation\naccuracy. ONNXim also preserves dependencies between compute and tile DMAs.\nMeanwhile, the DRAM and NoC are modeled in cycle-level to properly model\ncontention among multiple cores that can execute different DNN models for\nmulti-tenancy. Consequently, ONNXim is significantly faster than existing\nsimulators (e.g., by up to 384x over Accel-sim) and enables various case\nstudies, such as multi-tenant NPUs, that were previously impractical due to\nslow speed and/or lack of functionalities. ONNXim is publicly available at\nhttps://github.com/PSAL-POSTECH/ONNXim.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As DNNs are widely adopted in various application domains while demanding
increasingly higher compute and memory requirements, designing efficient and
performant NPUs (Neural Processing Units) is becoming more important. However,
existing architectural NPU simulators lack support for high-speed simulation,
multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or
different deep learning frameworks. To address these limitations, this work
proposes ONNXim, a fast cycle-level simulator for multi-core NPUs in DNN
serving systems. It takes DNN models represented in the ONNX graph format
generated from various deep learning frameworks for ease of simulation. In
addition, based on the observation that typical NPU cores process tensor tiles
from on-chip scratchpad memory with deterministic compute latency, we forgo a
detailed modeling for the computation while still preserving simulation
accuracy. ONNXim also preserves dependencies between compute and tile DMAs.
Meanwhile, the DRAM and NoC are modeled in cycle-level to properly model
contention among multiple cores that can execute different DNN models for
multi-tenancy. Consequently, ONNXim is significantly faster than existing
simulators (e.g., by up to 384x over Accel-sim) and enables various case
studies, such as multi-tenant NPUs, that were previously impractical due to
slow speed and/or lack of functionalities. ONNXim is publicly available at
https://github.com/PSAL-POSTECH/ONNXim.