Pub Date : 2023-11-28DOI: 10.1109/LCA.2023.3336841
Hyeseong Kim;Yunjae Lee;Minsoo Rhu
Deep neural network (DNN)-based recommendation systems (RecSys) are one of the most successfully deployed machine learning applications in commercial services for predicting ad click-through rates or rankings. While numerous prior work explored hardware and software solutions to reduce the training time of RecSys, its end-to-end training pipeline including the data preprocessing stage has received little attention. In this work, we provide a comprehensive analysis of RecSys data preprocessing, root-causing the feature generation and normalization steps to cause a major performance bottleneck. Based on our characterization, we explore the efficacy of an FPGA-accelerated RecSys preprocessing system that achieves a significant 3.4–12.1× end-to-end speedup compared to the baseline CPU-based RecSys preprocessing system.
{"title":"FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems","authors":"Hyeseong Kim;Yunjae Lee;Minsoo Rhu","doi":"10.1109/LCA.2023.3336841","DOIUrl":"https://doi.org/10.1109/LCA.2023.3336841","url":null,"abstract":"Deep neural network (DNN)-based recommendation systems (RecSys) are one of the most successfully deployed machine learning applications in commercial services for predicting ad click-through rates or rankings. While numerous prior work explored hardware and software solutions to reduce the training time of RecSys, its end-to-end training pipeline including the data preprocessing stage has received little attention. In this work, we provide a comprehensive analysis of RecSys data preprocessing, root-causing the feature generation and normalization steps to cause a major performance bottleneck. Based on our characterization, we explore the efficacy of an FPGA-accelerated RecSys preprocessing system that achieves a significant 3.4–12.1× end-to-end speedup compared to the baseline CPU-based RecSys preprocessing system.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 1","pages":"7-10"},"PeriodicalIF":2.3,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139504430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-20DOI: 10.1109/LCA.2023.3334989
Peiyun Wu;Trung Le;Zhichun Zhu;Zhao Zhang
DRAM memory reliability is increasingly a concern as recent studies found. In this letter, we propose RAIMD (Redundant Array of Independent Memory Devices), an energy-efficient memory organization with RAID-like error protection. In this organization, each memory device works as an independent memory module to serve a whole memory request and to support error detection and error recovery. It relies on the high data rate of modern memory device to minimize the performance impact of increased data transfer time. RAIMD provides chip-level error protection similar to Chipkill but with significant energy savings. Our simulation results indicate that RAIMD can save memory energy by 26.3% on average with a small performance overhead of 5.3% on DDR5-4800 memory systems for SPEC2017 multi-core workloads.
{"title":"Redundant Array of Independent Memory Devices","authors":"Peiyun Wu;Trung Le;Zhichun Zhu;Zhao Zhang","doi":"10.1109/LCA.2023.3334989","DOIUrl":"https://doi.org/10.1109/LCA.2023.3334989","url":null,"abstract":"DRAM memory reliability is increasingly a concern as recent studies found. In this letter, we propose RAIMD (Redundant Array of Independent Memory Devices), an energy-efficient memory organization with RAID-like error protection. In this organization, each memory device works as an independent memory module to serve a whole memory request and to support error detection and error recovery. It relies on the high data rate of modern memory device to minimize the performance impact of increased data transfer time. RAIMD provides chip-level error protection similar to Chipkill but with significant energy savings. Our simulation results indicate that RAIMD can save memory energy by 26.3% on average with a small performance overhead of 5.3% on DDR5-4800 memory systems for SPEC2017 multi-core workloads.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"22 2","pages":"181-184"},"PeriodicalIF":2.3,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138633920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.1109/LCA.2023.3333759
Haocong Luo;Yahya Can Tuğrul;F. Nisa Bostancı;Ataberk Olgun;A. Giray Yağlıkçı;Onur Mutlu
We present Ramulator 2.0, a highly modular and extensible DRAM simulator that enables rapid and agile implementation and evaluation of design changes in the memory controller and DRAM to meet the increasing research effort in improving the performance, security, and reliability of memory systems. Ramulator 2.0 abstracts and models key components in a DRAM-based memory system and their interactions into shared interfaces