The rising performance demands and increasing heterogeneity in cloud data centers lead to a paradigm shift in the cloud infrastructure, from monolithic servers to a disaggregated architecture. In a multi-tenant cloud, users should be able to leverage trusted computing to protect their applications from untrusted parties. While Trusted Execution Environments (TEEs) are a well-known technique to realize trusted computing on monolithic servers, we cannot adopt existing TEE technologies to the disaggregated architecture due to their distributed nature and heterogeneity of devices. To address these challenges, we propose trusted heterogeneous disaggregated architectures, which allows cloud users to construct virtual TEEs (vTEEs): TEE-based, secure, isolated environments assembled with any combination of disaggregated components.
{"title":"Trusted Heterogeneous Disaggregated Architectures","authors":"Atsushi Koshiba, Felix Gust, Julian Pritzi, Anjo Vahldiek-Oberwagner, Nuno Santos, Pramod Bhatotia","doi":"10.1145/3609510.3609812","DOIUrl":"https://doi.org/10.1145/3609510.3609812","url":null,"abstract":"The rising performance demands and increasing heterogeneity in cloud data centers lead to a paradigm shift in the cloud infrastructure, from monolithic servers to a disaggregated architecture. In a multi-tenant cloud, users should be able to leverage trusted computing to protect their applications from untrusted parties. While Trusted Execution Environments (TEEs) are a well-known technique to realize trusted computing on monolithic servers, we cannot adopt existing TEE technologies to the disaggregated architecture due to their distributed nature and heterogeneity of devices. To address these challenges, we propose trusted heterogeneous disaggregated architectures, which allows cloud users to construct virtual TEEs (vTEEs): TEE-based, secure, isolated environments assembled with any combination of disaggregated components.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130710884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Husain Noor Mohamed, Xiaoguang Wang, B. Ravindran
Linux eBPF allows a userspace application to execute code inside the Linux kernel without modifying the kernel code or inserting a kernel module. An in-kernel eBPF verifier pre-verifies any untrusted eBPF bytecode before running it in kernel context. Currently, users trust the verifier to block malicious bytecode from being executed. This paper studied the potential security issues from existing eBPF-related CVEs. Next, we present a generation-based eBPF fuzzer that generates syntactically and semantically valid eBPF programs to find bugs in the verifier component of the Linux kernel eBPF subsystem. The fuzzer extends the Linux Kernel Library (LKL) project to run multiple lightweight Linux instances simultaneously, with inputs from the automatically generated eBPF instruction sequences. Using this fuzzer, we can outperform the bpf-fuzzer [10] from the iovisor GitHub repository regarding fuzzing speed and the success rate of passing the eBPF verifier (valid generated code). We also found two existing ALU range-tracking bugs that appeared in an older Linux kernel (v5.10).
Linux eBPF允许用户空间应用程序在Linux内核中执行代码,而无需修改内核代码或插入内核模块。内核内的eBPF验证器在内核上下文中运行任何不受信任的eBPF字节码之前对其进行预验证。目前,用户信任验证器能够阻止恶意字节码的执行。本文研究了现有ebp相关cve的潜在安全问题。接下来,我们提出了一个基于生成的eBPF模糊器,它生成语法和语义上有效的eBPF程序,以查找Linux内核eBPF子系统的验证器组件中的错误。fuzzer扩展了Linux内核库(LKL)项目,可以同时运行多个轻量级Linux实例,并使用自动生成的eBPF指令序列输入。使用这个模糊器,我们可以在模糊测试速度和通过eBPF验证器(有效生成的代码)的成功率方面优于iovisor GitHub存储库中的bpf-fuzzer[10]。我们还发现了两个出现在旧Linux内核(v5.10)中的现有ALU距离跟踪错误。
{"title":"Understanding the Security of Linux eBPF Subsystem","authors":"Mohamed Husain Noor Mohamed, Xiaoguang Wang, B. Ravindran","doi":"10.1145/3609510.3609822","DOIUrl":"https://doi.org/10.1145/3609510.3609822","url":null,"abstract":"Linux eBPF allows a userspace application to execute code inside the Linux kernel without modifying the kernel code or inserting a kernel module. An in-kernel eBPF verifier pre-verifies any untrusted eBPF bytecode before running it in kernel context. Currently, users trust the verifier to block malicious bytecode from being executed. This paper studied the potential security issues from existing eBPF-related CVEs. Next, we present a generation-based eBPF fuzzer that generates syntactically and semantically valid eBPF programs to find bugs in the verifier component of the Linux kernel eBPF subsystem. The fuzzer extends the Linux Kernel Library (LKL) project to run multiple lightweight Linux instances simultaneously, with inputs from the automatically generated eBPF instruction sequences. Using this fuzzer, we can outperform the bpf-fuzzer [10] from the iovisor GitHub repository regarding fuzzing speed and the success rate of passing the eBPF verifier (valid generated code). We also found two existing ALU range-tracking bugs that appeared in an older Linux kernel (v5.10).","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133809539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zoned Namespace (ZNS) provides the Zone Append primitive to boost the write performance of ZNS SSDs via intrazone parallelism. However, making Zone Append effective for a RAID array of multiple ZNS SSDs is non-trivial, since Zone Append offloads address management to ZNS SSDs and requires hosts to dedicatedly manage RAID stripes across multiple drives. We propose ZapRAID, a high-performance software RAID layer for ZNS SSDs by carefully using Zone Append to achieve high write parallelism and lightweight stripe management. ZapRAID's core idea is a group-based data layout with coarse-grained ordering across multiple groups of stripes, such that it can use small-size metadata for stripe management on a per-group basis. Our prototype evaluation shows that ZapRAID achieves a 2.34x write throughput gain compared with using the Zone Write primitive.
{"title":"ZapRAID: Toward High-Performance RAID for ZNS SSDs via Zone Append","authors":"Qiuping Wang, P. Lee","doi":"10.1145/3609510.3609810","DOIUrl":"https://doi.org/10.1145/3609510.3609810","url":null,"abstract":"Zoned Namespace (ZNS) provides the Zone Append primitive to boost the write performance of ZNS SSDs via intrazone parallelism. However, making Zone Append effective for a RAID array of multiple ZNS SSDs is non-trivial, since Zone Append offloads address management to ZNS SSDs and requires hosts to dedicatedly manage RAID stripes across multiple drives. We propose ZapRAID, a high-performance software RAID layer for ZNS SSDs by carefully using Zone Append to achieve high write parallelism and lightweight stripe management. ZapRAID's core idea is a group-based data layout with coarse-grained ordering across multiple groups of stripes, such that it can use small-size metadata for stripe management on a per-group basis. Our prototype evaluation shows that ZapRAID achieves a 2.34x write throughput gain compared with using the Zone Write primitive.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126421583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhida An, Ding Li, Yao Guo, Guijin Gao, Yuxin Ren, Ning Jia, Xinwei Hu
To achieve extremely high performance in HPC, many researchers have proposed customized operating systems that are tailored to HPC workload characteristics and emerging hardware. Hence, we argue that the HPC cluster will move away from the single OS environment to a cluster with numerous heterogeneous OSes. However, existing HPC cluster management still assumes that all nodes are equipped with the same OS and fails to consider OS heterogeneity during job scheduling. As a result, such unawareness loses most performance benefits provided by specialized OSes. This paper quantitatively investigates the problem of ignoring OS heterogeneity in the current HPC cluster management and analyzes performance trade-offs inside heterogeneous OSes. Preliminary results on a variety of HPC OSes and applications confirm the performance penalty of the existing cluster scheduler. We then propose a cluster scheduler prototype that incorporates OS heterogeneity into cluster configuration, resource monitoring, and job placement. We also present open challenges for future research on OS heterogeneity aware HPC clusters.
{"title":"Towards OS Heterogeneity Aware Cluster Management for HPC","authors":"Zhida An, Ding Li, Yao Guo, Guijin Gao, Yuxin Ren, Ning Jia, Xinwei Hu","doi":"10.1145/3609510.3609819","DOIUrl":"https://doi.org/10.1145/3609510.3609819","url":null,"abstract":"To achieve extremely high performance in HPC, many researchers have proposed customized operating systems that are tailored to HPC workload characteristics and emerging hardware. Hence, we argue that the HPC cluster will move away from the single OS environment to a cluster with numerous heterogeneous OSes. However, existing HPC cluster management still assumes that all nodes are equipped with the same OS and fails to consider OS heterogeneity during job scheduling. As a result, such unawareness loses most performance benefits provided by specialized OSes. This paper quantitatively investigates the problem of ignoring OS heterogeneity in the current HPC cluster management and analyzes performance trade-offs inside heterogeneous OSes. Preliminary results on a variety of HPC OSes and applications confirm the performance penalty of the existing cluster scheduler. We then propose a cluster scheduler prototype that incorporates OS heterogeneity into cluster configuration, resource monitoring, and job placement. We also present open challenges for future research on OS heterogeneity aware HPC clusters.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report on our initial effort to formally verify the seL4 Core Platform, an OS framework for the verified seL4 microkernel. This includes a formal specification of the seL4 Core Platform library, an automated proof of its functional correctness, and a verified mapping of the seL4 Core Platform's System Description to the CapDL formalism that describes seL4 access rights and enables verified system initialisation.
{"title":"First steps in verifying the seL4 Core Platform","authors":"Mathieu Paturel, Isitha Subasinghe, G. Heiser","doi":"10.1145/3609510.3609821","DOIUrl":"https://doi.org/10.1145/3609510.3609821","url":null,"abstract":"We report on our initial effort to formally verify the seL4 Core Platform, an OS framework for the verified seL4 microkernel. This includes a formal specification of the seL4 Core Platform library, an automated proof of its functional correctness, and a verified mapping of the seL4 Core Platform's System Description to the CapDL formalism that describes seL4 access rights and enables verified system initialisation.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121315357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.
{"title":"Family Classification based on Tree Representations for Malware","authors":"Yang Xu, Zhuotai Chen","doi":"10.1145/3609510.3609818","DOIUrl":"https://doi.org/10.1145/3609510.3609818","url":null,"abstract":"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121492622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the emerging of machine learning, many commercial companies increasingly utilize machine learning inference systems as backend services to improve their products. Serverless computing is a modern paradigm that provides auto-scaling, event-driven services, making it particularly well-suited for various domains, including video stream analysis, IoT serving and machine learning applications. The flexible scaling feature of serverless computing is adept at handling the burstiness of ML workloads. However, despite its compatibility with ML inference tasks, the cost of serverless inference systems remain relatively high in comparison to traditional serving paradigms, primarily due to the under-utilization of CPU resources offered by serverless platforms. To tackle this challenge, we design and deploy a serverless inference serving system that incorporates batching and multi-process mechanisms to enhance cost efficiency. By applying a change-point detection algorithm to manage bursty workloads, it optimizes resource usage and achieves lower costs. We employ an Amazon EC2 server for handling request packaging and running the core Bayesian Optimization algorithm without any prior information. The preliminary system, implemented on AWS Lambda, can significantly reduce expenses and save up to 62% compared to the original serverless inference system.
{"title":"Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing","authors":"Shen Cai, Zhi Zhou, Kongyange Zhao, Xu Chen","doi":"10.1145/3609510.3609816","DOIUrl":"https://doi.org/10.1145/3609510.3609816","url":null,"abstract":"With the emerging of machine learning, many commercial companies increasingly utilize machine learning inference systems as backend services to improve their products. Serverless computing is a modern paradigm that provides auto-scaling, event-driven services, making it particularly well-suited for various domains, including video stream analysis, IoT serving and machine learning applications. The flexible scaling feature of serverless computing is adept at handling the burstiness of ML workloads. However, despite its compatibility with ML inference tasks, the cost of serverless inference systems remain relatively high in comparison to traditional serving paradigms, primarily due to the under-utilization of CPU resources offered by serverless platforms. To tackle this challenge, we design and deploy a serverless inference serving system that incorporates batching and multi-process mechanisms to enhance cost efficiency. By applying a change-point detection algorithm to manage bursty workloads, it optimizes resource usage and achieves lower costs. We employ an Amazon EC2 server for handling request packaging and running the core Bayesian Optimization algorithm without any prior information. The preliminary system, implemented on AWS Lambda, can significantly reduce expenses and save up to 62% compared to the original serverless inference system.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129600020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios C. Androutsopoulos, Giorgos Kappes, S. Anastasiadis
There is an increasing interest to quantify and improve the isolation provided by containers to competing applications on multitenant hosts. As a first step to address this need, we introduce several metrics that quantify the exposure of the applications to the source code of the kernel subsystems. Based on existing tracing tools, we develop a common framework and build two toolchains that automate the extraction of the metrics. We experimentally compare the tracing accuracy of the toolchains by calculating the metrics across different workloads and demonstrate the importance of separating the application execution from unrelated system activity.
{"title":"Quantifying the Security Profile of Linux Applications","authors":"Georgios C. Androutsopoulos, Giorgos Kappes, S. Anastasiadis","doi":"10.1145/3609510.3609814","DOIUrl":"https://doi.org/10.1145/3609510.3609814","url":null,"abstract":"There is an increasing interest to quantify and improve the isolation provided by containers to competing applications on multitenant hosts. As a first step to address this need, we introduce several metrics that quantify the exposure of the applications to the source code of the kernel subsystems. Based on existing tracing tools, we develop a common framework and build two toolchains that automate the extraction of the metrics. We experimentally compare the tracing accuracy of the toolchains by calculating the metrics across different workloads and demonstrate the importance of separating the application execution from unrelated system activity.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126787270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Containers, which have evolved in Linux primarily, have become a significant trend in the cloud due to their lightweight virtualization and growing convenient ecosystem. However, the laxer isolation of containerization also introduces attack surfaces on the underlying Linux kernel. Unfortunately, combining other virtualizations, such as the traditional VM and interposition by application kernel, for sandboxing could spoil the lightweight and scalable nature of the containers. In this study, we propose another approach to lightweight sandboxing that focuses on the fact that such attackers have mostly assumed containers rely on Linux. It can avert major vulnerability exploits derived from Linux by transplanting Linux containers onto the FreeBSD kernel. Furthermore, it can fortify the isolation by transparently applying "Capsicum," a unique sandbox mechanism that is nonstandard in Linux, to the transplanted containers. This paper analyzes vulnerabilities faced by Linux containers, identifies technical issues in transplanting Linux containers onto FreeBSD, and designs a mechanism to transparently apply the Capsicum sandbox to Linux applications to explore the feasibility of our approach.
{"title":"Reducing Attack Surface with Container Transplantation for Lightweight Sandboxing","authors":"Yuki Nakata, Shintaro Suzuki, Katsuya Matsubara","doi":"10.1145/3609510.3609820","DOIUrl":"https://doi.org/10.1145/3609510.3609820","url":null,"abstract":"Containers, which have evolved in Linux primarily, have become a significant trend in the cloud due to their lightweight virtualization and growing convenient ecosystem. However, the laxer isolation of containerization also introduces attack surfaces on the underlying Linux kernel. Unfortunately, combining other virtualizations, such as the traditional VM and interposition by application kernel, for sandboxing could spoil the lightweight and scalable nature of the containers. In this study, we propose another approach to lightweight sandboxing that focuses on the fact that such attackers have mostly assumed containers rely on Linux. It can avert major vulnerability exploits derived from Linux by transplanting Linux containers onto the FreeBSD kernel. Furthermore, it can fortify the isolation by transparently applying \"Capsicum,\" a unique sandbox mechanism that is nonstandard in Linux, to the transplanted containers. This paper analyzes vulnerabilities faced by Linux containers, identifies technical issues in transplanting Linux containers onto FreeBSD, and designs a mechanism to transparently apply the Capsicum sandbox to Linux applications to explore the feasibility of our approach.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Baek, Jonghyun Bae, Donghyun Lee, Hyun-Cheol Bae, Yeonhong Park, Jae W. Lee
Today's deep neural network (DNN) training pipeline utilizes hardware resources holistically, including host CPUs and storage devices for preprocessing the input data and accelerators like GPUs for computing gradients. As the performance of the accelerator scales rapidly, the frontend data preparation stages are becoming a new performance bottleneck to yield suboptimal training throughput. Since the bottleneck in the pipeline may vary depending on hardware configurations, DNN models, and datasets, overprovisioning hardware resources for data preparation such as CPU cores and disk bandwidth is not a cost-effective solution. Instead, we make a case for leveraging multiple data formats, possibly with opposing characteristics in resource utilization, to balance the training pipeline. This idea is realized by Liquid, a new system for building an efficient training pipeline with multi-format datasets. Our evaluation on three distinct execution environments demonstrates that Liquid achieves up to 3.05x and 1.54x higher data preparation throughput on Cityscapes/CityPersons (PNG) and ImageNet (JPEG) datasets, respectively, over the baseline single-format pipeline. This leads up to 2.02x and 1.25x higher end-to-end geomean training throughput with no accuracy drop.
{"title":"Liquid: Mix-and-Match Multiple Image Formats to Balance DNN Training Pipeline","authors":"W. Baek, Jonghyun Bae, Donghyun Lee, Hyun-Cheol Bae, Yeonhong Park, Jae W. Lee","doi":"10.1145/3609510.3609811","DOIUrl":"https://doi.org/10.1145/3609510.3609811","url":null,"abstract":"Today's deep neural network (DNN) training pipeline utilizes hardware resources holistically, including host CPUs and storage devices for preprocessing the input data and accelerators like GPUs for computing gradients. As the performance of the accelerator scales rapidly, the frontend data preparation stages are becoming a new performance bottleneck to yield suboptimal training throughput. Since the bottleneck in the pipeline may vary depending on hardware configurations, DNN models, and datasets, overprovisioning hardware resources for data preparation such as CPU cores and disk bandwidth is not a cost-effective solution. Instead, we make a case for leveraging multiple data formats, possibly with opposing characteristics in resource utilization, to balance the training pipeline. This idea is realized by Liquid, a new system for building an efficient training pipeline with multi-format datasets. Our evaluation on three distinct execution environments demonstrates that Liquid achieves up to 3.05x and 1.54x higher data preparation throughput on Cityscapes/CityPersons (PNG) and ImageNet (JPEG) datasets, respectively, over the baseline single-format pipeline. This leads up to 2.02x and 1.25x higher end-to-end geomean training throughput with no accuracy drop.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130168735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}