Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann
{"title":"Simulation-driven Latency Estimations for Multi-core Machine Learning Accelerators","authors":"Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann","doi":"10.1109/AICAS57966.2023.10168589","DOIUrl":null,"url":null,"abstract":"Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simulation-driven Latency Estimations for Multi-core Machine Learning Accelerators
Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.