{"title":"可扩展深度学习推理:算法方法","authors":"Minsik Cho","doi":"10.1109/IPDPSW50202.2020.00166","DOIUrl":null,"url":null,"abstract":"Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Deep Learning Inference: Algorithmic Approach\",\"authors\":\"Minsik Cho\",\"doi\":\"10.1109/IPDPSW50202.2020.00166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00166\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scalable Deep Learning Inference: Algorithmic Approach
Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.