Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00052
Kalyan Dasgupta, Umamaheswari Devi, Aanchal Goyal
Enterprises world-wide are increasingly prioritizing sustainability due to the growing focus on carbon neutrality as well as the requirement to adhere to emerging strict regulations from governments across the globe. With many enterprise workloads deployed on cloud and data centers, to fulfill the mandatory carbon reporting requirements of their clients, it is becoming inevitable for cloud providers and data center operators to quantify each client’s share of the total carbon emission from their facility. Accurate carbon quantification requires power measurements to be available at the lowest level of the hardware infrastructure such as physical servers and network switches. However, power sensing is quite limited in many data centers, with measurements normally available only at an aggregated level such as the rack level. To drill down to the level of a workload to capture the correct power usage per workload, it is very important to dis-aggregate this power across servers. In this paper, we propose a software based non-linear model using the Newton-Raphson method to estimate the power model parameters of individual servers using server utilizations when the overall rack level power measurements are given. The methodology is applicable to data centers with multiple types of servers in a rack and is light-weight in the sense that it does not require mechanisms such as shutting down individual servers in order to estimate idle power. The method is also generalized to account for the real world scenario where the time granularity of rack power and server utilization measurements may not match. We have conducted detailed evaluations of the methods proposed and find good convergence for parameter estimation even when tested with multiple different initial conditions.
{"title":"Towards complete dis-aggregation of data center rack power using light-weight mechanisms","authors":"Kalyan Dasgupta, Umamaheswari Devi, Aanchal Goyal","doi":"10.1109/CLOUD55607.2022.00052","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00052","url":null,"abstract":"Enterprises world-wide are increasingly prioritizing sustainability due to the growing focus on carbon neutrality as well as the requirement to adhere to emerging strict regulations from governments across the globe. With many enterprise workloads deployed on cloud and data centers, to fulfill the mandatory carbon reporting requirements of their clients, it is becoming inevitable for cloud providers and data center operators to quantify each client’s share of the total carbon emission from their facility. Accurate carbon quantification requires power measurements to be available at the lowest level of the hardware infrastructure such as physical servers and network switches. However, power sensing is quite limited in many data centers, with measurements normally available only at an aggregated level such as the rack level. To drill down to the level of a workload to capture the correct power usage per workload, it is very important to dis-aggregate this power across servers. In this paper, we propose a software based non-linear model using the Newton-Raphson method to estimate the power model parameters of individual servers using server utilizations when the overall rack level power measurements are given. The methodology is applicable to data centers with multiple types of servers in a rack and is light-weight in the sense that it does not require mechanisms such as shutting down individual servers in order to estimate idle power. The method is also generalized to account for the real world scenario where the time granularity of rack power and server utilization measurements may not match. We have conducted detailed evaluations of the methods proposed and find good convergence for parameter estimation even when tested with multiple different initial conditions.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"270 1","pages":"299-308"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75013760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00030
Changyong Shin, Gyeongsik Yang, Yeonho Yoo, J. Lee, C. Yoo
Deep learning models have a wide spectrum of GPU execution time and memory size. When running distributed training jobs, however, their GPU execution time and memory size have not been taken into account, which leads to the high variance of job completion time (JCT). Moreover, the jobs often run into the GPU out-of-memory (OoM) problem so that the unlucky job has to restart all over. To address the problems, we propose Xonar to profile the deep learning jobs and order them in the queue. The experiments show that Xonar with TensorFlow v1.6 reduces the tail JCT by 44% with the OoM problem eliminated.
{"title":"Xonar: Profiling-based Job Orderer for Distributed Deep Learning","authors":"Changyong Shin, Gyeongsik Yang, Yeonho Yoo, J. Lee, C. Yoo","doi":"10.1109/CLOUD55607.2022.00030","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00030","url":null,"abstract":"Deep learning models have a wide spectrum of GPU execution time and memory size. When running distributed training jobs, however, their GPU execution time and memory size have not been taken into account, which leads to the high variance of job completion time (JCT). Moreover, the jobs often run into the GPU out-of-memory (OoM) problem so that the unlucky job has to restart all over. To address the problems, we propose Xonar to profile the deep learning jobs and order them in the queue. The experiments show that Xonar with TensorFlow v1.6 reduces the tail JCT by 44% with the OoM problem eliminated.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"249 1","pages":"112-114"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76781174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00048
Ali Sydney, A. Alim, Chris Ward, C. Basso, B. Karaçali
Cloud networks support workloads with diverse characteristics. A key challenge facing cloud providers is how to meet the stringent performance and security needs of these diverse applications running over a shared infrastructure. To address this challenge, some providers build for peak capacity or even build dedicated clusters with specialized networks, which may be under-utilized at times. We propose a virtualization approach that customizes data center network resources to the needs of applications. Our approach is based on slicing data center network resources on-demand and customizing these slices to target workloads. Such slices can grow or shrink dynamically and programmatically based on workload demands. This elasticity provides a more efficient solution over building dedicated clusters. In our approach, a slice can be customized to a given set of workloads with similar security and performance requirements carved out of the underlying network. It leverages a software-defined underlay network controller and segment routing for fine-grained path control and service chaining. We have implemented a prototype of our fabric virtualization solution based on network slicing. In this paper, we first present the architecture of our prototype. Second, we present empirical results of slice provisioning times in networks of varying sizes and switch operating systems. Empirical results indicate that our prototype can support slice provisioning in the order of tens to hundreds of seconds and can meet the provisioning requirements of production networks.
{"title":"Cloud Data Center Fabric Virtualization","authors":"Ali Sydney, A. Alim, Chris Ward, C. Basso, B. Karaçali","doi":"10.1109/CLOUD55607.2022.00048","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00048","url":null,"abstract":"Cloud networks support workloads with diverse characteristics. A key challenge facing cloud providers is how to meet the stringent performance and security needs of these diverse applications running over a shared infrastructure. To address this challenge, some providers build for peak capacity or even build dedicated clusters with specialized networks, which may be under-utilized at times. We propose a virtualization approach that customizes data center network resources to the needs of applications. Our approach is based on slicing data center network resources on-demand and customizing these slices to target workloads. Such slices can grow or shrink dynamically and programmatically based on workload demands. This elasticity provides a more efficient solution over building dedicated clusters. In our approach, a slice can be customized to a given set of workloads with similar security and performance requirements carved out of the underlying network. It leverages a software-defined underlay network controller and segment routing for fine-grained path control and service chaining. We have implemented a prototype of our fabric virtualization solution based on network slicing. In this paper, we first present the architecture of our prototype. Second, we present empirical results of slice provisioning times in networks of varying sizes and switch operating systems. Empirical results indicate that our prototype can support slice provisioning in the order of tens to hundreds of seconds and can meet the provisioning requirements of production networks.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"122 1","pages":"263-272"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88010399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00068
Danlin Jia, Geng Yuan, Xue Lin, N. Mi
Deep Neural Network (DNN) has been applied as an effective machine learning algorithm to tackle problems in different domains. However, training a sophisticated DNN model takes days to weeks and becomes a challenge in constructing research on large-scale DNN models. Distributed Deep Learning (DDL) contributes to accelerating DNN training by distributing training workloads across multiple computation accelerators (e.g., GPUs). Although a surge of research works has been devoted to optimizing DDL training, the impact of data-loading on GPU usage and training performance has been relatively under-explored. It is non-trivial to optimize data-loading in DDL applications that need intensive CPU and I/O resources to process enormous training data. When multiple DDL applications are deployed on a system (e.g., Cloud and HPC), the lack of a practical and efficient technique for data-loader allocation incurs GPU idleness and degrades the training throughput. Therefore, our work first focuses on investigating the impact of data-loading on the global training throughput. We then propose a throughput prediction model to predict the maximum throughput for an individual DDL training application. By leveraging the predicted results, A-Dloader is designed to dynamically allocate CPU and I/O resources to concurrently running DDL applications and use the data-loader allocation as a knob to reduce GPU idle intervals and thus improve the overall training throughput. We implement and evaluate A-Dloader in a DDL framework for a series of DDL applications arriving and completing across the runtime. Our experimental results show that A-Dloader can achieve a 23.5% throughput improvement and a 10% makespan improvement, compared to allocating resources evenly across applications.
{"title":"A Data-Loader Tunable Knob to Shorten GPU Idleness for Distributed Deep Learning","authors":"Danlin Jia, Geng Yuan, Xue Lin, N. Mi","doi":"10.1109/CLOUD55607.2022.00068","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00068","url":null,"abstract":"Deep Neural Network (DNN) has been applied as an effective machine learning algorithm to tackle problems in different domains. However, training a sophisticated DNN model takes days to weeks and becomes a challenge in constructing research on large-scale DNN models. Distributed Deep Learning (DDL) contributes to accelerating DNN training by distributing training workloads across multiple computation accelerators (e.g., GPUs). Although a surge of research works has been devoted to optimizing DDL training, the impact of data-loading on GPU usage and training performance has been relatively under-explored. It is non-trivial to optimize data-loading in DDL applications that need intensive CPU and I/O resources to process enormous training data. When multiple DDL applications are deployed on a system (e.g., Cloud and HPC), the lack of a practical and efficient technique for data-loader allocation incurs GPU idleness and degrades the training throughput. Therefore, our work first focuses on investigating the impact of data-loading on the global training throughput. We then propose a throughput prediction model to predict the maximum throughput for an individual DDL training application. By leveraging the predicted results, A-Dloader is designed to dynamically allocate CPU and I/O resources to concurrently running DDL applications and use the data-loader allocation as a knob to reduce GPU idle intervals and thus improve the overall training throughput. We implement and evaluate A-Dloader in a DDL framework for a series of DDL applications arriving and completing across the runtime. Our experimental results show that A-Dloader can achieve a 23.5% throughput improvement and a 10% makespan improvement, compared to allocating resources evenly across applications.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"135 1","pages":"449-458"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89489596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00042
Yaoyin You, Binbin Feng, Zhijun Ding
Current IaaS providers deploy cheaper computing resources in newly built data centers and provide cross-regional network services to improve the interoperability of computing resources in different regions. Third-party service providers can use part of their budget to purchase cross-regional communication resources to use cheaper resources in remote areas to reduce the cost of processing massive task requests. The Q-percentile charging model is widely used in cross-regional communication resources billing, but there is little task scheduling research on that billing method. Therefore, this paper studies a geo-distributed task scheduling scenario using the Q-percentile charging model. We design a geo-scheduling algorithm specifically for Q-percentile charging model to allocate resources in the two dimensions of computing resources and communication resources. Furthermore, referring to three existing communication resource allocation strategies, we design three bandwidth allocation algorithms considering the Q-percentile charging characteristics to provide suitable solutions for different scenarios. We conducted experiments based on public well-known datasets such as LIGO workflow. Results show that, compared with the baseline, the scheduling algorithm proposed in this paper can reduce the task scheduling cost between geo-distributed data centers by 10%-20% based on various task loads and show differences in the applicability of different communication resource allocation strategies.
{"title":"Q-percentile Bandwidth Billing Based Geo-Scheduling Algorithm","authors":"Yaoyin You, Binbin Feng, Zhijun Ding","doi":"10.1109/CLOUD55607.2022.00042","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00042","url":null,"abstract":"Current IaaS providers deploy cheaper computing resources in newly built data centers and provide cross-regional network services to improve the interoperability of computing resources in different regions. Third-party service providers can use part of their budget to purchase cross-regional communication resources to use cheaper resources in remote areas to reduce the cost of processing massive task requests. The Q-percentile charging model is widely used in cross-regional communication resources billing, but there is little task scheduling research on that billing method. Therefore, this paper studies a geo-distributed task scheduling scenario using the Q-percentile charging model. We design a geo-scheduling algorithm specifically for Q-percentile charging model to allocate resources in the two dimensions of computing resources and communication resources. Furthermore, referring to three existing communication resource allocation strategies, we design three bandwidth allocation algorithms considering the Q-percentile charging characteristics to provide suitable solutions for different scenarios. We conducted experiments based on public well-known datasets such as LIGO workflow. Results show that, compared with the baseline, the scheduling algorithm proposed in this paper can reduce the task scheduling cost between geo-distributed data centers by 10%-20% based on various task loads and show differences in the applicability of different communication resource allocation strategies.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"10 1","pages":"219-229"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78925696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00040
Seema Nagar, Suranjana Samanta, P. Mohapatra, Debanjana Kar
As an increasing number of organizations migrate to the cloud, the main challenge before an operations team is how to effectively use an overwhelming amount of information derivable from multiple data sources like logs, metrics, and traces to help maintain the robustness and availability of cloud services. Site Reliability Engineers (SRE) depend on periodic log data to understand the state of an application and to diagnose the potential root cause of a problem. Despite best practices, service outages happen and result in the loss of billions of dollars in revenue. Many a times, indicators of these outages are buried in the flood of alerts which an SRE receives. Therefore, it is important to reduce noisy alerts so that an SRE can focus on what is critical. Log Anomaly Detection detects anomalous system behaviours and finds patterns (anomalies) in data that do not conform to expected behaviour. Different anomaly detection techniques have been incorporated into various AIOps platforms, but they all suffer from a large number of false positives. Also, some anomalies are transient and resolve on their own. In this paper, we propose an unsupervised model-agnostic persistent anomaly detector based on golden signal based signatures, as a post-processing filtering step on detected anomalies, so we don’t have to interfere with the existing deployed anomaly detector in a system.
{"title":"Building Golden Signal Based Signatures for Log Anomaly Detection","authors":"Seema Nagar, Suranjana Samanta, P. Mohapatra, Debanjana Kar","doi":"10.1109/CLOUD55607.2022.00040","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00040","url":null,"abstract":"As an increasing number of organizations migrate to the cloud, the main challenge before an operations team is how to effectively use an overwhelming amount of information derivable from multiple data sources like logs, metrics, and traces to help maintain the robustness and availability of cloud services. Site Reliability Engineers (SRE) depend on periodic log data to understand the state of an application and to diagnose the potential root cause of a problem. Despite best practices, service outages happen and result in the loss of billions of dollars in revenue. Many a times, indicators of these outages are buried in the flood of alerts which an SRE receives. Therefore, it is important to reduce noisy alerts so that an SRE can focus on what is critical. Log Anomaly Detection detects anomalous system behaviours and finds patterns (anomalies) in data that do not conform to expected behaviour. Different anomaly detection techniques have been incorporated into various AIOps platforms, but they all suffer from a large number of false positives. Also, some anomalies are transient and resolve on their own. In this paper, we propose an unsupervised model-agnostic persistent anomaly detector based on golden signal based signatures, as a post-processing filtering step on detected anomalies, so we don’t have to interfere with the existing deployed anomaly detector in a system.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"14 1","pages":"203-208"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79120876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00074
Negar Mohammadi Koushki, Sanjeev Sondur, K. Kant
The increasing use of the DevOps paradigm in software systems has substantially increased the frequency of configuration parameter setting changes. Ensuring the correctness of such settings is generally a very challenging problem due to the complex interdependencies, and calls for an automated mechanism that can both run quickly and provide accurate settings. In this paper, we propose an efficient discrete combinatorial optimization technique that makes two unique contributions: (a) an improved and extended metaheuristic that exploits the application domain knowledge for fast convergence, and (b) the development and quantification of a discrete version of the classical tunneling mechanism to improve the accuracy of the solution. Our extensive evaluation using available workload traces that do include configuration information shows that the proposed technique can provide a lower-cost solution (by ~60%) with faster convergence (by ~48%) as compared to the traditional metaheuristic algorithms. Also, our solution succeeds in finding a feasible solution in approximately 30% more cases than the baseline algorithm.
{"title":"Automated Configuration for Agile Software Environments","authors":"Negar Mohammadi Koushki, Sanjeev Sondur, K. Kant","doi":"10.1109/CLOUD55607.2022.00074","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00074","url":null,"abstract":"The increasing use of the DevOps paradigm in software systems has substantially increased the frequency of configuration parameter setting changes. Ensuring the correctness of such settings is generally a very challenging problem due to the complex interdependencies, and calls for an automated mechanism that can both run quickly and provide accurate settings. In this paper, we propose an efficient discrete combinatorial optimization technique that makes two unique contributions: (a) an improved and extended metaheuristic that exploits the application domain knowledge for fast convergence, and (b) the development and quantification of a discrete version of the classical tunneling mechanism to improve the accuracy of the solution. Our extensive evaluation using available workload traces that do include configuration information shows that the proposed technique can provide a lower-cost solution (by ~60%) with faster convergence (by ~48%) as compared to the traditional metaheuristic algorithms. Also, our solution succeeds in finding a feasible solution in approximately 30% more cases than the baseline algorithm.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"16 1","pages":"511-521"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76742933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00037
Johan Ruuskanen, A. Cervin
Dynamic resource management is a difficult problem in modern microservice applications. Many proposed methods rely on the availability of an analytical performance model, often based on queueing theory. Such models can always be hand-crafted, but this takes time and requires expert knowledge. Various methods have been proposed that can automatically extract models from logs or tracing data. However, they are often intricate, requiring off-line stages and advanced algorithms for retrieving the service-time distributions. Furthermore, the resulting models can be complex and unsuitable for online evaluation. Aiming for simplicity, we in this paper introduce a general queuing network model for microservice applications that can be (i) quickly and accurately solved using a refined mean-field fluid model and (ii) completely extracted at runtime in a distributed fashion from common local tracing data at each service. The fit of the model and the prediction accuracies under system perturbations are evaluated in a cloud-based microservice application and are found to be accurate.
{"title":"Distributed online extraction of a fluid model for microservice applications using local tracing data","authors":"Johan Ruuskanen, A. Cervin","doi":"10.1109/CLOUD55607.2022.00037","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00037","url":null,"abstract":"Dynamic resource management is a difficult problem in modern microservice applications. Many proposed methods rely on the availability of an analytical performance model, often based on queueing theory. Such models can always be hand-crafted, but this takes time and requires expert knowledge. Various methods have been proposed that can automatically extract models from logs or tracing data. However, they are often intricate, requiring off-line stages and advanced algorithms for retrieving the service-time distributions. Furthermore, the resulting models can be complex and unsuitable for online evaluation. Aiming for simplicity, we in this paper introduce a general queuing network model for microservice applications that can be (i) quickly and accurately solved using a refined mean-field fluid model and (ii) completely extracted at runtime in a distributed fashion from common local tracing data at each service. The fit of the model and the prediction accuracies under system perturbations are evaluated in a cloud-based microservice application and are found to be accurate.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"81 1","pages":"179-190"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79700049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00076
Sergiu Carpov, Nicolas Gama, Mariya Georgieva, Dimitar Jetchev
We present a framework GenoPPML for privacy-preserving machine learning in the context of sensitive genomic data processing. The technology combines secure multiparty computation techniques based on the recently proposed Manticore framework for model training and fully homomorphic encryption based on TFHE for model inference. The framework was successfully used to solve breast cancer prediction problems on gene expression datasets coming from distinct private sources while preserving their privacy - the solution winning 1st place for both Tracks I and III of the genomic privacy competition iDASH’2020. Extensive benchmarks and comparisons to existing works are performed. Our 2-party logistic regression computation is 11× faster than the one in [1] on the same dataset and it uses only one CPU core.
{"title":"GenoPPML – a framework for genomic privacy-preserving machine learning","authors":"Sergiu Carpov, Nicolas Gama, Mariya Georgieva, Dimitar Jetchev","doi":"10.1109/CLOUD55607.2022.00076","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00076","url":null,"abstract":"We present a framework GenoPPML for privacy-preserving machine learning in the context of sensitive genomic data processing. The technology combines secure multiparty computation techniques based on the recently proposed Manticore framework for model training and fully homomorphic encryption based on TFHE for model inference. The framework was successfully used to solve breast cancer prediction problems on gene expression datasets coming from distinct private sources while preserving their privacy - the solution winning 1st place for both Tracks I and III of the genomic privacy competition iDASH’2020. Extensive benchmarks and comparisons to existing works are performed. Our 2-party logistic regression computation is 11× faster than the one in [1] on the same dataset and it uses only one CPU core.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"14 1","pages":"532-542"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80952268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00041
Chunhua Xiao, S. Qiu, Dandan Xu
Traditional computing architectures that separate computing from storage face severe limitations when processing the data that is continuously produced in the cloud and at the edge. Recently, the computational storage device (CSD) is becoming one of the critical cloud infrastructures which can overcome these limitations. Many studies utilize CSD for DNN training to extract useful information and knowledge from the data quickly and efficiently. However, all previous work has used homogeneous storage, which is not fully considered the requirements of DNN training on CSD. Thus, we exploit the leverage of hybrid NAND flash memory to optimize this problem. Nevertheless, typical hybrid storage architectures have limitations when used for DNN training. Moreover, their management strategies can not fully exploit the heterogeneity of hybrid flash memory. To address this issue, we propose a novel SLC-TLC flash memory called Co-Partitioning Flash (Cop-Flash), which utilizes two different hybrid flash memory partitioning methods to divide storage into three different properties of flash memory. Meanwhile, two key technologies are included in Cop-Flash: 1) lifetime-based I/O identifier is proposed to identify data hotness according to data lifetime to maximize the benefits of heterogeneity and minimize the impact of garbage collection. 2) Erase-aware Adaptive Dual-zone Management is proposed to increase bandwidth utilization and guarantee system reliability. We compared Cop-Flash with two related state-of-the-art hybrid storage using hard partitioning and soft partitioning as well as TLC-only flash memory under real DNN training workloads. Experimental results show that Cop-Flash improves the performance by 29.1%, 38.8%, 56.6% and outperforms them by 2.3x, 1.29x, and 8.3x in terms of lifespan.
{"title":"Cop-Flash: Utilizing hybrid storage to construct a large, efficient, and durable computational storage for DNN training","authors":"Chunhua Xiao, S. Qiu, Dandan Xu","doi":"10.1109/CLOUD55607.2022.00041","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00041","url":null,"abstract":"Traditional computing architectures that separate computing from storage face severe limitations when processing the data that is continuously produced in the cloud and at the edge. Recently, the computational storage device (CSD) is becoming one of the critical cloud infrastructures which can overcome these limitations. Many studies utilize CSD for DNN training to extract useful information and knowledge from the data quickly and efficiently. However, all previous work has used homogeneous storage, which is not fully considered the requirements of DNN training on CSD. Thus, we exploit the leverage of hybrid NAND flash memory to optimize this problem. Nevertheless, typical hybrid storage architectures have limitations when used for DNN training. Moreover, their management strategies can not fully exploit the heterogeneity of hybrid flash memory. To address this issue, we propose a novel SLC-TLC flash memory called Co-Partitioning Flash (Cop-Flash), which utilizes two different hybrid flash memory partitioning methods to divide storage into three different properties of flash memory. Meanwhile, two key technologies are included in Cop-Flash: 1) lifetime-based I/O identifier is proposed to identify data hotness according to data lifetime to maximize the benefits of heterogeneity and minimize the impact of garbage collection. 2) Erase-aware Adaptive Dual-zone Management is proposed to increase bandwidth utilization and guarantee system reliability. We compared Cop-Flash with two related state-of-the-art hybrid storage using hard partitioning and soft partitioning as well as TLC-only flash memory under real DNN training workloads. Experimental results show that Cop-Flash improves the performance by 29.1%, 38.8%, 56.6% and outperforms them by 2.3x, 1.29x, and 8.3x in terms of lifespan.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"9 1","pages":"209-218"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79919590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}