Aritra Ray, Zhaobo Zhang, Ying Xiong, K. Chakrabarty
{"title":"PriRecT: Privacy-preserving Job Recommendation Tool for GPU Sharing","authors":"Aritra Ray, Zhaobo Zhang, Ying Xiong, K. Chakrabarty","doi":"10.1109/CloudSummit54781.2022.00021","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) jobs significantly benefit when trained on abundant GPU resources. It leads to resource contention when several ML training jobs are scheduled con-currently on a single GPU in the compute cluster. A job's performance is susceptible to its competitor's task on a single GPU. We, in this paper, propose PriRecT, a novel ML job recommendation tool that preserves user privacy for scheduling ML training jobs in the GPU compute cluster. We perform workload characterization for several ML training scripts, and the Futurewei mini-ML Workload Dataset is released publicly [1]. We build a knowledge base of inter and intra-cluster task interference for GPU sharing through a clustering-based approach. For scheduling purposes, PriRecT blinds the user-sensitive information and assigns the job to an existing cluster. Based on clustering results, PriRecT recommends jobs that should run concurrently on a single GPU to minimize task interference and additionally assigns an uncertainty score to account for job variations in the recommendation.","PeriodicalId":106553,"journal":{"name":"2022 IEEE Cloud Summit","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Cloud Summit","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudSummit54781.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine Learning (ML) jobs significantly benefit when trained on abundant GPU resources. It leads to resource contention when several ML training jobs are scheduled con-currently on a single GPU in the compute cluster. A job's performance is susceptible to its competitor's task on a single GPU. We, in this paper, propose PriRecT, a novel ML job recommendation tool that preserves user privacy for scheduling ML training jobs in the GPU compute cluster. We perform workload characterization for several ML training scripts, and the Futurewei mini-ML Workload Dataset is released publicly [1]. We build a knowledge base of inter and intra-cluster task interference for GPU sharing through a clustering-based approach. For scheduling purposes, PriRecT blinds the user-sensitive information and assigns the job to an existing cluster. Based on clustering results, PriRecT recommends jobs that should run concurrently on a single GPU to minimize task interference and additionally assigns an uncertainty score to account for job variations in the recommendation.