Rodrigo M. Sokulski, P. C. Santos, Sairo R. dos Santos, M. Alves
{"title":"SAPIVe: Simple AVX to PIM Vectorizer","authors":"Rodrigo M. Sokulski, P. C. Santos, Sairo R. dos Santos, M. Alves","doi":"10.1109/SBESC56799.2022.9964539","DOIUrl":null,"url":null,"abstract":"Larger vector extensions are one of the commonly used techniques to meet the growing demands from computational systems. These extensions, capable of operating over multiple data elements with a single instruction, exert a lot of pressure on the memory hierarchy, increasing the impact of growing problems such as Memory-Wall and von Neumann bottleneck. An alternative to work around these problems would be adding processing elements close to the memory, known as Processing-In-Memory (PIM). As with processor vector extensions, the most efficient PIM techniques use in-memory vector processing units. There are several ways to convert a code into in-memory vector processing, such as binary hardware translation, which may not depend on programmers or adapted software and can be carried out transparently to its users. However, in the context of in-memory processing, this conversion technique presents some challenges related to the PIM instructions format and the structure of the loops present in each application. Thus, this article proposes and evaluates Simple AVX to PIM Vectorizer (SAPIVe), a hardware binary translation mechanism from processor vector instructions into in-memory vector instructions, which, in addition to processing more data, also performs loads, operations, and stores at once. Our results show that our mechanism can accelerate kernels up to 5 times with possible performance losses prevented using loop predictors.","PeriodicalId":130479,"journal":{"name":"2022 XII Brazilian Symposium on Computing Systems Engineering (SBESC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 XII Brazilian Symposium on Computing Systems Engineering (SBESC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBESC56799.2022.9964539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Larger vector extensions are one of the commonly used techniques to meet the growing demands from computational systems. These extensions, capable of operating over multiple data elements with a single instruction, exert a lot of pressure on the memory hierarchy, increasing the impact of growing problems such as Memory-Wall and von Neumann bottleneck. An alternative to work around these problems would be adding processing elements close to the memory, known as Processing-In-Memory (PIM). As with processor vector extensions, the most efficient PIM techniques use in-memory vector processing units. There are several ways to convert a code into in-memory vector processing, such as binary hardware translation, which may not depend on programmers or adapted software and can be carried out transparently to its users. However, in the context of in-memory processing, this conversion technique presents some challenges related to the PIM instructions format and the structure of the loops present in each application. Thus, this article proposes and evaluates Simple AVX to PIM Vectorizer (SAPIVe), a hardware binary translation mechanism from processor vector instructions into in-memory vector instructions, which, in addition to processing more data, also performs loads, operations, and stores at once. Our results show that our mechanism can accelerate kernels up to 5 times with possible performance losses prevented using loop predictors.