{"title":"用Exo处理矩阵乘法微核生成","authors":"Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez","doi":"arxiv-2310.17408","DOIUrl":null,"url":null,"abstract":"The optimization of the matrix multiplication (or GEMM) has been a need\nduring the last decades. This operation is considered the flagship of current\nlinear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its\nwidespread use in a large variety of scientific applications. The GEMM is\nusually implemented following the GotoBLAS philosophy, which tiles the GEMM\noperands and uses a series of nested loops for performance improvement. These\napproaches extract the maximum computational power of the architectures through\nsmall pieces of hardware-oriented, high-performance code called micro-kernel.\nHowever, this approach forces developers to generate, with a non-negligible\neffort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating\nmicro-kernels with the Exo compiler that performs close to (or even better\nthan) manually developed microkernels written with intrinsic functions or\nassembly language. Our solution also improves the portability of the generated\ncode, since a hardware target is fully specified by a concise library-based\ndescription of its instructions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tackling the Matrix Multiplication Micro-kernel Generation with Exo\",\"authors\":\"Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez\",\"doi\":\"arxiv-2310.17408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The optimization of the matrix multiplication (or GEMM) has been a need\\nduring the last decades. This operation is considered the flagship of current\\nlinear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its\\nwidespread use in a large variety of scientific applications. The GEMM is\\nusually implemented following the GotoBLAS philosophy, which tiles the GEMM\\noperands and uses a series of nested loops for performance improvement. These\\napproaches extract the maximum computational power of the architectures through\\nsmall pieces of hardware-oriented, high-performance code called micro-kernel.\\nHowever, this approach forces developers to generate, with a non-negligible\\neffort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating\\nmicro-kernels with the Exo compiler that performs close to (or even better\\nthan) manually developed microkernels written with intrinsic functions or\\nassembly language. Our solution also improves the portability of the generated\\ncode, since a hardware target is fully specified by a concise library-based\\ndescription of its instructions.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"11 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2310.17408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2310.17408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tackling the Matrix Multiplication Micro-kernel Generation with Exo
The optimization of the matrix multiplication (or GEMM) has been a need
during the last decades. This operation is considered the flagship of current
linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its
widespread use in a large variety of scientific applications. The GEMM is
usually implemented following the GotoBLAS philosophy, which tiles the GEMM
operands and uses a series of nested loops for performance improvement. These
approaches extract the maximum computational power of the architectures through
small pieces of hardware-oriented, high-performance code called micro-kernel.
However, this approach forces developers to generate, with a non-negligible
effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating
micro-kernels with the Exo compiler that performs close to (or even better
than) manually developed microkernels written with intrinsic functions or
assembly language. Our solution also improves the portability of the generated
code, since a hardware target is fully specified by a concise library-based
description of its instructions.