{"title":"A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System","authors":"Lingfei Lu;Yudi Qiu;Shiyan Yi;Yibo Fan","doi":"10.1109/LCA.2023.3305668","DOIUrl":null,"url":null,"abstract":"Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works divide embedding vectors either horizontally or vertically. However, the effectiveness of horizontal or vertical partitioning is hard to guarantee under different memory configurations or embedding vector sizes. To improve this issue, we propose FeaNMP, a \n<underline>f</u>\nlexible \n<underline>e</u>\nmbedding-\n<underline>a</u>\nware \n<underline>NMP</u>\n architecture that accelerates the inference phase of RS. We explore different partitioning strategies in detail and design a flexible way to select optimal ones depending on different embedding dimensions and DDR configurations. As a result, compared to the state-of-the-art rank-level NMP work RecNMP, our work achieves up to 11.1× speedup for embedding layers under mix-dimensioned workloads.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"165-168"},"PeriodicalIF":1.4000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10221697/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works divide embedding vectors either horizontally or vertically. However, the effectiveness of horizontal or vertical partitioning is hard to guarantee under different memory configurations or embedding vector sizes. To improve this issue, we propose FeaNMP, a
f
lexible
e
mbedding-
a
ware
NMP
architecture that accelerates the inference phase of RS. We explore different partitioning strategies in detail and design a flexible way to select optimal ones depending on different embedding dimensions and DDR configurations. As a result, compared to the state-of-the-art rank-level NMP work RecNMP, our work achieves up to 11.1× speedup for embedding layers under mix-dimensioned workloads.
期刊介绍:
IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.