NEST-C：用于带有人工智能加速器的异构计算系统的深度学习编译器框架

IF 1.3 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC ETRI Journal Pub Date : 2024-10-28 DOI:10.4218/etrij.2024-0139

Jeman Park, Misun Yu, Jinse Kwon, Junmo Park, Jemin Lee, Yongin Kwon

{"title":"NEST-C：用于带有人工智能加速器的异构计算系统的深度学习编译器框架","authors":"Jeman Park, Misun Yu, Jinse Kwon, Junmo Park, Jemin Lee, Yongin Kwon","doi":"10.4218/etrij.2024-0139","DOIUrl":null,"url":null,"abstract":"<p>Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 5","pages":"851-864"},"PeriodicalIF":1.3000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2024-0139","citationCount":"0","resultStr":"{\"title\":\"NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators\",\"authors\":\"Jeman Park, Misun Yu, Jinse Kwon, Junmo Park, Jemin Lee, Yongin Kwon\",\"doi\":\"10.4218/etrij.2024-0139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.</p>\",\"PeriodicalId\":11901,\"journal\":{\"name\":\"ETRI Journal\",\"volume\":\"46 5\",\"pages\":\"851-864\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2024-0139\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETRI Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2024-0139\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2024-0139","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

深度学习（DL）极大地推动了人工智能（AI）的发展；然而，PyTorch、ONNX 和 TensorFlow 等框架是针对通用 GPU 优化的，导致神经处理单元（NPU）和内存处理（PIM）设备等专用加速器的效率低下。这些加速器旨在优化吞吐量和能效，但它们需要更有针对性的优化。为了解决这些局限性，我们提出了 NEST 编译器（NEST-C），这是一个新颖的 DL 框架，可改善模型在各种人工智能加速器上的部署和性能。NEST-C 利用基于剖析的量化、动态图分割和多级中间表示（IR）集成，在不同的硬件平台上高效执行。我们的研究结果表明，NEST-C 显著提高了各种人工智能加速器的计算效率和适应性，实现了更高的吞吐量、更低的延迟、更高的资源利用率和更强的模型可移植性。这些优势有助于在现代人工智能应用中更高效地部署 DL 模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ETRI Journal 工程技术-电信学

CiteScore

4.00

自引率

7.10%

发文量

审稿时长

6.9 months

期刊介绍： ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics. Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security. With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.