{"title":"Building an Inference Server Platform for Large Language Models Using Dataflow PIM Platform","authors":"Kyu Hyun Choi, Taeho Hwang","doi":"10.1109/ICEIC61013.2024.10457213","DOIUrl":null,"url":null,"abstract":"Processing-in-Memory (PIM) has garnered attention as a platform for large language model inference due to its ability to perform computations within memory, leveraging the internal bandwidth of memory components. In data center environments, to execute AI models across multiple nodes, an inference server is typically deployed at the data center's frontend. This server orchestrates the assignment of AI inference tasks to the appropriate nodes. This paper presents the construction of an open source-based inference server designed for easy deployment of a PIM platform grounded in data flow architecture within a data center setting. We have conducted operational tests on large language models to validate the efficacy of our approach.","PeriodicalId":518726,"journal":{"name":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"323 7","pages":"1-3"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC61013.2024.10457213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Processing-in-Memory (PIM) has garnered attention as a platform for large language model inference due to its ability to perform computations within memory, leveraging the internal bandwidth of memory components. In data center environments, to execute AI models across multiple nodes, an inference server is typically deployed at the data center's frontend. This server orchestrates the assignment of AI inference tasks to the appropriate nodes. This paper presents the construction of an open source-based inference server designed for easy deployment of a PIM platform grounded in data flow architecture within a data center setting. We have conducted operational tests on large language models to validate the efficacy of our approach.