Pub Date : 2023-08-16DOI: 10.1109/LCA.2023.3305668
Lingfei Lu;Yudi Qiu;Shiyan Yi;Yibo Fan
Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works divide embedding vectors either horizontally or vertically. However, the effectiveness of horizontal or vertical partitioning is hard to guarantee under different memory configurations or embedding vector sizes. To improve this issue, we propose FeaNMP, a f