{"title":"张量加速器的 LLM 辅助编译","authors":"Charles Hong, Sahil Bhatia, Altan Haan, Shengjun Kris Dong, Dima Nikiforov, Alvin Cheung, Yakun Sophia Shao","doi":"arxiv-2408.03408","DOIUrl":null,"url":null,"abstract":"Hardware accelerators, in particular accelerators for tensor processing, have\nmany potential application domains. However, they currently lack the software\ninfrastructure to support the majority of domains outside of deep learning.\nFurthermore, a compiler that can easily be updated to reflect changes at both\napplication and hardware levels would enable more agile development and design\nspace exploration of accelerators, allowing hardware designers to realize\ncloser-to-optimal performance. In this work, we discuss how large language\nmodels (LLMs) could be leveraged to build such a compiler. Specifically, we\ndemonstrate the ability of GPT-4 to achieve high pass rates in translating code\nto the Gemmini accelerator, and prototype a technique for decomposing\ntranslation into smaller, more LLM-friendly steps. Additionally, we propose a\n2-phase workflow for utilizing LLMs to generate hardware-optimized code.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-Aided Compilation for Tensor Accelerators\",\"authors\":\"Charles Hong, Sahil Bhatia, Altan Haan, Shengjun Kris Dong, Dima Nikiforov, Alvin Cheung, Yakun Sophia Shao\",\"doi\":\"arxiv-2408.03408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware accelerators, in particular accelerators for tensor processing, have\\nmany potential application domains. However, they currently lack the software\\ninfrastructure to support the majority of domains outside of deep learning.\\nFurthermore, a compiler that can easily be updated to reflect changes at both\\napplication and hardware levels would enable more agile development and design\\nspace exploration of accelerators, allowing hardware designers to realize\\ncloser-to-optimal performance. In this work, we discuss how large language\\nmodels (LLMs) could be leveraged to build such a compiler. Specifically, we\\ndemonstrate the ability of GPT-4 to achieve high pass rates in translating code\\nto the Gemmini accelerator, and prototype a technique for decomposing\\ntranslation into smaller, more LLM-friendly steps. Additionally, we propose a\\n2-phase workflow for utilizing LLMs to generate hardware-optimized code.\",\"PeriodicalId\":501197,\"journal\":{\"name\":\"arXiv - CS - Programming Languages\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hardware accelerators, in particular accelerators for tensor processing, have
many potential application domains. However, they currently lack the software
infrastructure to support the majority of domains outside of deep learning.
Furthermore, a compiler that can easily be updated to reflect changes at both
application and hardware levels would enable more agile development and design
space exploration of accelerators, allowing hardware designers to realize
closer-to-optimal performance. In this work, we discuss how large language
models (LLMs) could be leveraged to build such a compiler. Specifically, we
demonstrate the ability of GPT-4 to achieve high pass rates in translating code
to the Gemmini accelerator, and prototype a technique for decomposing
translation into smaller, more LLM-friendly steps. Additionally, we propose a
2-phase workflow for utilizing LLMs to generate hardware-optimized code.