Proceedings of the 1st Workshop on Machine Learning and Systems最新文献

英文中文

Interference-Aware Scheduling for Inference Serving 基于干扰感知的推理服务调度

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458837

Daniel Mendoza, Francisco Romero, Qian Li, N. Yadwadkar, C. Kozyrakis

Machine learning inference applications have proliferated through diverse domains such as healthcare, security, and analytics. Recent work has proposed inference serving systems for improving the deployment and scalability of models. To improve resource utilization, multiple models can be co-located on the same backend machine. However, co-location can cause latency degradation due to interference and can subsequently violate latency requirements. Although interference-aware schedulers for general workloads have been introduced, they do not scale appropriately to heterogeneous inference serving systems where the number of co-location configurations grows exponentially with the number of models and machine types. This paper proposes an interference-aware scheduler for heterogeneous inference serving systems, reducing the latency degradation from co-location interference. We characterize the challenges in predicting the impact of co-location interference on inference latency (e.g., varying latency degradation across machine types), and identify properties of models and hardware that should be considered during scheduling. We then propose a unified prediction model that estimates an inference model's latency degradation during co-location, and develop an interference-aware scheduler that leverages this predictor. Our preliminary results show that our interference-aware scheduler achieves 2× lower latency degradation than a commonly used least-loaded scheduler. We also discuss future research directions for interference-aware schedulers for inference serving systems.

机器学习推理应用程序已经在医疗保健、安全和分析等不同领域激增。最近的工作提出了用于改进模型部署和可扩展性的推理服务系统。为了提高资源利用率，可以将多个模型放在同一台后端机器上。但是，协同位置可能会由于干扰而导致延迟降低，并随后可能违反延迟要求。尽管已经为一般工作负载引入了干扰感知调度器，但它们不能适当地扩展到异构推理服务系统，在这些系统中，协同定位配置的数量随着模型和机器类型的数量呈指数级增长。针对异构推理服务系统，提出了一种干扰感知调度器，减少了同位干扰带来的延迟退化。我们描述了在预测同址干扰对推理延迟的影响方面的挑战(例如，不同机器类型的延迟退化)，并确定了在调度期间应该考虑的模型和硬件的属性。然后，我们提出了一个统一的预测模型，该模型估计了一个推理模型在共置期间的延迟退化，并开发了一个利用该预测器的干扰感知调度器。我们的初步结果表明，我们的干扰感知调度器比常用的最小负载调度器实现了低2倍的延迟退化。讨论了推理服务系统中干扰感知调度器的未来研究方向。

{"title":"Interference-Aware Scheduling for Inference Serving","authors":"Daniel Mendoza, Francisco Romero, Qian Li, N. Yadwadkar, C. Kozyrakis","doi":"10.1145/3437984.3458837","DOIUrl":"https://doi.org/10.1145/3437984.3458837","url":null,"abstract":"Machine learning inference applications have proliferated through diverse domains such as healthcare, security, and analytics. Recent work has proposed inference serving systems for improving the deployment and scalability of models. To improve resource utilization, multiple models can be co-located on the same backend machine. However, co-location can cause latency degradation due to interference and can subsequently violate latency requirements. Although interference-aware schedulers for general workloads have been introduced, they do not scale appropriately to heterogeneous inference serving systems where the number of co-location configurations grows exponentially with the number of models and machine types. This paper proposes an interference-aware scheduler for heterogeneous inference serving systems, reducing the latency degradation from co-location interference. We characterize the challenges in predicting the impact of co-location interference on inference latency (e.g., varying latency degradation across machine types), and identify properties of models and hardware that should be considered during scheduling. We then propose a unified prediction model that estimates an inference model's latency degradation during co-location, and develop an interference-aware scheduler that leverages this predictor. Our preliminary results show that our interference-aware scheduler achieves 2× lower latency degradation than a commonly used least-loaded scheduler. We also discuss future research directions for interference-aware schedulers for inference serving systems.","PeriodicalId":269840,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Are we there yet? Estimating Training Time for Recommendation Systems 我们到了吗?推荐系统的训练时间估计

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458832

I. Paun, Yashar Moshfeghi, Nikos Ntarmos

Recommendation systems (RS) are a key component of modern commercial platforms, with Collaborative Filtering (CF) based RSs being the centrepiece. Relevant research has long focused on measuring and improving the effectiveness of such CF systems, but alas their efficiency - especially with regards to their time- and resource-consuming training phase - has received little to no attention. This work is a first step in the direction of addressing this gap. To do so, we first perform a methodical study of the computational complexity of the training phase for a number of highly popular CF-based RSs, including approaches based on matrix factorisation, k-nearest neighbours, co-clustering, and slope one schemes. Based on this, we then build a simple yet effective predictor that, given a small sample of a dataset, is able to predict training times over the complete dataset. Our systematic experimental evaluation shows that our approach outperforms state-of-the-art regression schemes by a considerable margin.

推荐系统(RS)是现代商业平台的关键组成部分，基于协同过滤(CF)的RSs是其核心。长期以来，相关研究一直关注于衡量和提高此类CF系统的有效性，但遗憾的是，它们的效率——特别是在耗时和消耗资源的训练阶段——几乎没有受到关注。这项工作是解决这一差距的第一步。为此，我们首先对一些非常流行的基于cf的RSs的训练阶段的计算复杂性进行了系统的研究，包括基于矩阵分解、k近邻、共聚类和斜率为1的方案的方法。在此基础上，我们建立了一个简单而有效的预测器，在给定数据集的小样本的情况下，能够预测整个数据集的训练时间。我们的系统实验评估表明，我们的方法在相当大的范围内优于最先进的回归方案。

引用次数: 2

Towards Optimal Configuration of Microservices 迈向微服务的最优配置

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458828

Gagan Somashekar, Anshul Gandhi

The microservice architecture allows applications to be designed in a modular format, whereby each microservice can implement a single functionality and can be independently managed and deployed. However, an undesirable side-effect of this modular design is the large state space of possibly inter-dependent configuration parameters (of the constituent microservices) which have to be tuned to improve application performance. This workshop paper investigates optimization techniques and dimensionality reduction strategies for tuning microservices applications, empirically demonstrating the significant tail latency improvements (as much as 23%) that can be achieved with configuration tuning.

微服务架构允许以模块化格式设计应用程序，每个微服务可以实现单个功能，并且可以独立管理和部署。然而，这种模块化设计的一个不良副作用是(组成微服务的)可能相互依赖的配置参数的大状态空间，必须对其进行调优以提高应用程序性能。这篇研讨会论文研究了调优微服务应用程序的优化技术和降维策略，经验证明了通过配置调优可以实现显著的尾部延迟改进(多达23%)。

引用次数: 12

Queen Jane Approximately: Enabling Efficient Neural Network Inference with Context-Adaptivity 简女王近似:实现具有上下文适应性的高效神经网络推理

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458833

O. Machidon, Davor Sluga, V. Pejović

Recent advances in deep learning allow on-demand reduction of model complexity, without a need for re-training, thus enabling a dynamic trade-off between the inference accuracy and the energy savings. Approximate mobile computing, on the other hand, adapts the computation approximation level as the context of usage, and consequently the computation needs or result accuracy needs, vary. In this work, we propose a synergy between the two directions and develop a context-aware method for dynamically adjusting the width of an on-device neural network based on the input and context-dependent classification confidence. We implement our method on a human activity recognition neural network and through measurements on a real-world embedded device demonstrate that such a network would save up to 37.8% energy and induce only 1% loss of accuracy, if used for continuous activity monitoring in the field of elderly care.

深度学习的最新进展允许按需降低模型复杂性，而不需要重新训练，从而在推理精度和节能之间实现动态权衡。另一方面，近似移动计算根据使用上下文调整计算近似水平，因此计算需求或结果精度需求是不同的。在这项工作中，我们提出了两个方向之间的协同作用，并开发了一种基于输入和上下文相关分类置信度动态调整设备上神经网络宽度的上下文感知方法。我们在人类活动识别神经网络上实现了我们的方法，并通过对现实世界嵌入式设备的测量表明，如果用于老年人护理领域的连续活动监测，这种网络将节省高达37.8%的能量，并且只会导致1%的准确性损失。

引用次数: 1

Fast Optimisation of Convolutional Neural Network Inference using System Performance Models 基于系统性能模型的卷积神经网络推理快速优化

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458840

Rik Mulder, Valentin Radu, Christophe Dubach

The choice of convolutional routines (or primitives) for implementing the operations in a Convolutional Neural Network (CNN) has a tremendous impact over the inference time. To optimise the execution latency for a target system, a lengthy profiling stage is needed - iterating over all the implementations of convolutional primitives in the configuration of each layer to measure their execution time on that platform. Each primitive exercises the system resources in different ways, so new profiling is currently needed when optimising for another system. In this work, we replace this prohibitively expensive profiling stage with a machine learning based approach of performance modelling. Our approach drastically speeds up the optimisation by estimating the latency of convolutional primitives in any layer configuration running on a target system. We reduce the time needed for optimising the execution of large neural networks on an ARM Cortex-A73 system from hours to just seconds. Our performance model is easily transferable across target platforms. This is demonstrated by training a performance model on an Intel platform and transferring its predictive performance to AMD and ARM systems, using very few profiled samples from the target platforms for fine-tuning the performance model.

在卷积神经网络(CNN)中，实现操作的卷积例程(或原语)的选择对推理时间有很大的影响。为了优化目标系统的执行延迟，需要一个冗长的分析阶段——迭代每层配置中的卷积原语的所有实现，以测量它们在该平台上的执行时间。每个原语以不同的方式使用系统资源，因此在优化另一个系统时，当前需要新的分析。在这项工作中，我们用基于机器学习的性能建模方法取代了这个昂贵的分析阶段。我们的方法通过估计在目标系统上运行的任何层配置中的卷积原语的延迟大大加快了优化速度。我们将在ARM Cortex-A73系统上优化大型神经网络的执行所需的时间从几个小时减少到几秒钟。我们的性能模型很容易在目标平台之间转移。这是通过在Intel平台上训练性能模型并将其预测性能转移到AMD和ARM系统来证明的，使用来自目标平台的很少的分析样本来微调性能模型。

{"title":"Fast Optimisation of Convolutional Neural Network Inference using System Performance Models","authors":"Rik Mulder, Valentin Radu, Christophe Dubach","doi":"10.1145/3437984.3458840","DOIUrl":"https://doi.org/10.1145/3437984.3458840","url":null,"abstract":"The choice of convolutional routines (or primitives) for implementing the operations in a Convolutional Neural Network (CNN) has a tremendous impact over the inference time. To optimise the execution latency for a target system, a lengthy profiling stage is needed - iterating over all the implementations of convolutional primitives in the configuration of each layer to measure their execution time on that platform. Each primitive exercises the system resources in different ways, so new profiling is currently needed when optimising for another system. In this work, we replace this prohibitively expensive profiling stage with a machine learning based approach of performance modelling. Our approach drastically speeds up the optimisation by estimating the latency of convolutional primitives in any layer configuration running on a target system. We reduce the time needed for optimising the execution of large neural networks on an ARM Cortex-A73 system from hours to just seconds. Our performance model is easily transferable across target platforms. This is demonstrated by training a performance model on an Intel platform and transferring its predictive performance to AMD and ARM systems, using very few profiled samples from the target platforms for fine-tuning the performance model.","PeriodicalId":269840,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133821197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Predicting CPU usage for proactive autoscaling 预测主动自动伸缩的CPU使用情况

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458831

Thomas Wang, Simone Ferlin Oliveira, Marco Chiesa

Private and public clouds require users to specify requests for resources such as CPU and memory (RAM) to be provisioned for their applications. The values of these requests do not necessarily relate to the application's run-time requirements, but only help the cloud infrastructure resource manager to map requested resources to physical resources. If an application exceeds these values, it might be throttled or even terminated. As a consequence, requested values are often overestimated, resulting in poor resource utilization in the cloud infrastructure. Autoscaling is a technique used to overcome these problems. We observed that Kubernetes Vertical Pod Autoscaler (VPA) might be using an autoscaling strategy that performs poorly on workloads that periodically change. Our experimental results show that compared to VPA, predictive methods based on Holt-Winters exponential smoothing (HW) and Long Short-Term Memory (LSTM) can decrease CPU slack by over 40% while avoiding CPU insufficiency for various CPU workloads. Furthermore, LSTM has been shown to generate stabler predictions compared to that of HW, which allowed for more robust scaling decisions.

私有云和公共云都要求用户指定为其应用程序提供的CPU和内存(RAM)等资源的请求。这些请求的值不一定与应用程序的运行时需求相关，而只是帮助云基础设施资源管理器将请求的资源映射到物理资源。如果应用程序超过了这些值，它可能会被限制甚至终止。因此，请求值经常被高估，从而导致云基础设施中的资源利用率低下。自动缩放是一种用来克服这些问题的技术。我们观察到Kubernetes Vertical Pod Autoscaler (VPA)可能使用的自动缩放策略在周期性变化的工作负载上表现不佳。实验结果表明，与VPA相比，基于Holt-Winters指数平滑(HW)和长短期记忆(LSTM)的预测方法可以减少40%以上的CPU松弛，同时避免各种CPU工作负载下的CPU不足。此外，与HW相比，LSTM已被证明可以产生更稳定的预测，从而允许更稳健的扩展决策。

{"title":"Predicting CPU usage for proactive autoscaling","authors":"Thomas Wang, Simone Ferlin Oliveira, Marco Chiesa","doi":"10.1145/3437984.3458831","DOIUrl":"https://doi.org/10.1145/3437984.3458831","url":null,"abstract":"Private and public clouds require users to specify requests for resources such as CPU and memory (RAM) to be provisioned for their applications. The values of these requests do not necessarily relate to the application's run-time requirements, but only help the cloud infrastructure resource manager to map requested resources to physical resources. If an application exceeds these values, it might be throttled or even terminated. As a consequence, requested values are often overestimated, resulting in poor resource utilization in the cloud infrastructure. Autoscaling is a technique used to overcome these problems. We observed that Kubernetes Vertical Pod Autoscaler (VPA) might be using an autoscaling strategy that performs poorly on workloads that periodically change. Our experimental results show that compared to VPA, predictive methods based on Holt-Winters exponential smoothing (HW) and Long Short-Term Memory (LSTM) can decrease CPU slack by over 40% while avoiding CPU insufficiency for various CPU workloads. Furthermore, LSTM has been shown to generate stabler predictions compared to that of HW, which allowed for more robust scaling decisions.","PeriodicalId":269840,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning and Systems","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121171570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Developing a Siamese Network for Intrusion Detection Systems 为入侵检测系统开发暹罗网络

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458842

Hanan Hindy, C. Tachtatzis, Robert C. Atkinson, Ethan Bayne, X. Bellekens

Machine Learning (ML) for developing Intrusion Detection Systems (IDS) is a fast-evolving research area that has many unsolved domain challenges. Current IDS models face two challenges that limit their performance and robustness. Firstly, they require large datasets to train and their performance is highly dependent on the dataset size. Secondly, zero-day attacks demand that machine learning models are retrained in order to identify future attacks of this type. However, the sophistication and increasing rate of cyber attacks make retraining time prohibitive for practical implementation. This paper proposes a new IDS model that can learn from pair similarities rather than class discriminative features. Learning similarities requires less data for training and provides the ability to flexibly adapt to new cyber attacks, thus reducing the burden of retraining. The underlying model is based on Siamese Networks, therefore, given a number of instances, numerous similar and dissimilar pairs can be generated. The model is evaluated using three mainstream IDS datasets; CICIDS2017, KDD Cup'99, and NSL-KDD. The evaluation results confirm the ability of the Siamese Network model to suit IDS purposes by classifying cyber attacks based on similarity-based learning. This opens a new research direction for building adaptable IDS models using non-conventional ML techniques.

机器学习(ML)用于开发入侵检测系统(IDS)是一个快速发展的研究领域，有许多尚未解决的领域挑战。当前的IDS模型面临着限制其性能和鲁棒性的两个挑战。首先，它们需要大量的数据集来训练，并且它们的性能高度依赖于数据集的大小。其次，零日攻击需要重新训练机器学习模型，以识别这种类型的未来攻击。然而，网络攻击的复杂性和日益增长的速度使得再培训时间难以实际实施。本文提出了一种新的IDS模型，该模型可以从对相似度而不是类判别特征中学习。学习相似度需要更少的训练数据，并提供灵活适应新的网络攻击的能力，从而减少再培训的负担。底层模型基于Siamese Networks，因此，给定许多实例，可以生成许多相似和不相似的对。利用三个主流IDS数据集对模型进行了评估;CICIDS2017, KDD杯'99,NSL-KDD。评估结果证实了Siamese Network模型通过基于相似性的学习对网络攻击进行分类来适应IDS目的的能力。这为利用非常规ML技术构建适应性IDS模型开辟了新的研究方向。

{"title":"Developing a Siamese Network for Intrusion Detection Systems","authors":"Hanan Hindy, C. Tachtatzis, Robert C. Atkinson, Ethan Bayne, X. Bellekens","doi":"10.1145/3437984.3458842","DOIUrl":"https://doi.org/10.1145/3437984.3458842","url":null,"abstract":"Machine Learning (ML) for developing Intrusion Detection Systems (IDS) is a fast-evolving research area that has many unsolved domain challenges. Current IDS models face two challenges that limit their performance and robustness. Firstly, they require large datasets to train and their performance is highly dependent on the dataset size. Secondly, zero-day attacks demand that machine learning models are retrained in order to identify future attacks of this type. However, the sophistication and increasing rate of cyber attacks make retraining time prohibitive for practical implementation. This paper proposes a new IDS model that can learn from pair similarities rather than class discriminative features. Learning similarities requires less data for training and provides the ability to flexibly adapt to new cyber attacks, thus reducing the burden of retraining. The underlying model is based on Siamese Networks, therefore, given a number of instances, numerous similar and dissimilar pairs can be generated. The model is evaluated using three mainstream IDS datasets; CICIDS2017, KDD Cup'99, and NSL-KDD. The evaluation results confirm the ability of the Siamese Network model to suit IDS purposes by classifying cyber attacks based on similarity-based learning. This opens a new research direction for building adaptable IDS models using non-conventional ML techniques.","PeriodicalId":269840,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning and Systems","volume":"46 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130411247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

DistIR: An Intermediate Representation for Optimizing Distributed Neural Networks 分布式神经网络优化的一种中间表示

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458829

Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, A. Fitzgibbon, Tim Harris

The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, horizontal, and pipeline parallelism. However, selecting the best set of strategies for a given model and hardware configuration is challenging because debugging and testing on clusters is expensive. In this work we propose DistIR, an IR for explicitly representing distributed DNN computation that can capture many popular distribution strategies. We build an analysis framework for DistIR programs, including a simulator and reference executor that can be used to automatically search for an optimal distribution strategy. Our unified global representation also eases development of new distribution strategies, as one can reuse the lowering to per-rank backend programs. Preliminary results using a grid search over a hybrid data/horizontal/pipeline-parallel space suggest DistIR and its simulator can aid automatic DNN distribution.

随着深度神经网络(DNN)模型和数据集规模的迅速增长，出现了多种分布策略，如数据并行、水平并行和管道并行。然而，为给定的模型和硬件配置选择最佳策略集是一项挑战，因为在集群上调试和测试的成本很高。在这项工作中，我们提出了DistIR，一种用于显式表示分布式DNN计算的IR，可以捕获许多流行的分布策略。我们建立了一个DistIR程序的分析框架，包括一个模拟器和参考执行器，可以用来自动搜索最优的分布策略。我们统一的全球表示还简化了新分发策略的开发，因为可以重用降低到每个级别的后端程序。在混合数据/水平/管道-并行空间上使用网格搜索的初步结果表明，DistIR及其模拟器可以帮助自动DNN分布。

引用次数: 6

Towards Mitigating Device Heterogeneity in Federated Learning via Adaptive Model Quantization 基于自适应模型量化的联邦学习设备异质性研究

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-04-26 DOI: 10.1145/3437984.3458839

A. Abdelmoniem, M. Canini

Federated learning (FL) is increasingly becoming the norm for training models over distributed and private datasets. Major service providers rely on FL to improve services such as text auto-completion, virtual keyboards, and item recommendations. Nonetheless, training models with FL in practice requires significant amount of time (days or even weeks) because FL tasks execute in highly heterogeneous environments where devices only have widespread yet limited computing capabilities and network connectivity conditions. In this paper, we focus on mitigating the extent of device heterogeneity, which is a main contributing factor to training time in FL. We propose AQFL, a simple and practical approach leveraging adaptive model quantization to homogenize the computing resources of the clients. We evaluate AQFL on five common FL benchmarks. The results show that, in heterogeneous settings, AQFL obtains nearly the same quality and fairness of the model trained in homogeneous settings.

联邦学习(FL)正日益成为分布式和私有数据集上训练模型的标准。主要的服务提供商依靠FL来改进诸如文本自动补全、虚拟键盘和项目推荐等服务。尽管如此，在实践中使用FL训练模型需要大量的时间(几天甚至几周)，因为FL任务在高度异构的环境中执行，设备只有广泛但有限的计算能力和网络连接条件。在本文中，我们专注于减轻设备异构程度，这是FL中训练时间的主要影响因素。我们提出了AQFL，一种简单实用的方法，利用自适应模型量化来均匀化客户端的计算资源。我们在五个常见的FL基准上评估AQFL。结果表明，在异构环境下，AQFL获得了与同质环境下训练的模型几乎相同的质量和公平性。

引用次数: 37

High-Dimensional Bayesian Optimization with Multi-Task Learning for RocksDB 基于多任务学习的RocksDB高维贝叶斯优化

Proceedings of the 1st Workshop on Machine Learning and Systems

Pub Date : 2021-03-30 DOI: 10.1145/3437984.3458841

Sami Alabed, Eiko Yoneki

RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versatility comes at the cost of complex tuning configurations. This paper investigates maximizing the throughput of RocksDB 10 operations by auto-tuning ten parameters of varying ranges. Off-the-shelf optimizers struggle with high-dimensional problem spaces and require a large number of training samples. We propose two techniques to tackle this problem: multitask modeling and dimensionality reduction through clustering. By incorporating adjacent optimization in the model, the model converged faster and found complicated settings that other tuners could not find. This approach had an additional computational complexity overhead, which we mitigated by manually assigning parameters to each sub-goal through our knowledge of RocksDB. The model is then incorporated in a standard Bayesian Optimization loop to find parameters that maximize RocksDB's 10 throughput. Our method achieved x1.3 improvement when bench-marked against a simulation of Facebook's social graph traffic, and converged in ten optimization steps compared to other state-of-the-art methods that required fifty steps.

RocksDB是一个通用的嵌入式键值存储，可用于多种不同的设置。它的多功能性是以复杂的调优配置为代价的。本文研究了通过自动调整不同范围的10个参数来最大化RocksDB 10操作的吞吐量。现成的优化器在高维问题空间中挣扎，并且需要大量的训练样本。我们提出了两种技术来解决这个问题:多任务建模和聚类降维。通过在模型中加入相邻优化，该模型收敛速度更快，并发现了其他调谐器无法找到的复杂设置。这种方法有额外的计算复杂性开销，我们通过对RocksDB的了解，手动为每个子目标分配参数，从而减轻了这一点。然后将该模型整合到标准贝叶斯优化循环中，以找到最大化RocksDB 10吞吐量的参数。通过对Facebook社交图谱流量的模拟进行基准测试，我们的方法实现了x1.3的改进，与其他需要50个步骤的先进方法相比，我们的方法只需要10个优化步骤。

{"title":"High-Dimensional Bayesian Optimization with Multi-Task Learning for RocksDB","authors":"Sami Alabed, Eiko Yoneki","doi":"10.1145/3437984.3458841","DOIUrl":"https://doi.org/10.1145/3437984.3458841","url":null,"abstract":"RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versatility comes at the cost of complex tuning configurations. This paper investigates maximizing the throughput of RocksDB 10 operations by auto-tuning ten parameters of varying ranges. Off-the-shelf optimizers struggle with high-dimensional problem spaces and require a large number of training samples. We propose two techniques to tackle this problem: multitask modeling and dimensionality reduction through clustering. By incorporating adjacent optimization in the model, the model converged faster and found complicated settings that other tuners could not find. This approach had an additional computational complexity overhead, which we mitigated by manually assigning parameters to each sub-goal through our knowledge of RocksDB. The model is then incorporated in a standard Bayesian Optimization loop to find parameters that maximize RocksDB's 10 throughput. Our method achieved x1.3 improvement when bench-marked against a simulation of Facebook's social graph traffic, and converged in ten optimization steps compared to other state-of-the-art methods that required fifty steps.","PeriodicalId":269840,"journal":{"name":"Proceedings of the 1st Workshop on Machine Learning and Systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128115772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 1st Workshop on Machine Learning and Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀