Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00053
Qiaori Yao, Yuchong Hu, Liangfeng Cheng, P. Lee, D. Feng, Weichun Wang, Wei Chen
Erasure coding has been widely deployed in modern large-scale storage systems for storage-efficient fault tolerance by storing stripes of data and parity chunks. Recently, enterprises explore the notion of wide stripes to suppress the fraction of parity chunks in each stripe to achieve extreme storage savings. However, how to efficiently generate wide stripes remains a non-trivial issue. In particular, re-encoding the currently stored stripes (termed narrow stripes) into wide stripes triggers substantial bandwidth overhead in relocating and regenerating the chunks for wide stripes. We propose StripeMerge, a wide-stripe generation mechanism that selects and merges narrow stripes into wide stripes, with the primary objective of minimizing the wide-stripe generation bandwidth. We prove the existence of an optimal scheme that does not incur any data transfer for wide-stripe generation, yet the optimal scheme is computationally expensive. To this end, we propose two heuristics that can be efficiently executed with only limited wide-stripe generation bandwidth overhead. We prototype StripeMerge and show via both simulations and Amazon EC2 experiments that the wide-stripe generation time can be reduced by up to 87.8% over a state-of-the-art storage scaling approach.
{"title":"StripeMerge: Efficient Wide-Stripe Generation for Large-Scale Erasure-Coded Storage","authors":"Qiaori Yao, Yuchong Hu, Liangfeng Cheng, P. Lee, D. Feng, Weichun Wang, Wei Chen","doi":"10.1109/ICDCS51616.2021.00053","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00053","url":null,"abstract":"Erasure coding has been widely deployed in modern large-scale storage systems for storage-efficient fault tolerance by storing stripes of data and parity chunks. Recently, enterprises explore the notion of wide stripes to suppress the fraction of parity chunks in each stripe to achieve extreme storage savings. However, how to efficiently generate wide stripes remains a non-trivial issue. In particular, re-encoding the currently stored stripes (termed narrow stripes) into wide stripes triggers substantial bandwidth overhead in relocating and regenerating the chunks for wide stripes. We propose StripeMerge, a wide-stripe generation mechanism that selects and merges narrow stripes into wide stripes, with the primary objective of minimizing the wide-stripe generation bandwidth. We prove the existence of an optimal scheme that does not incur any data transfer for wide-stripe generation, yet the optimal scheme is computationally expensive. To this end, we propose two heuristics that can be efficiently executed with only limited wide-stripe generation bandwidth overhead. We prototype StripeMerge and show via both simulations and Amazon EC2 experiments that the wide-stripe generation time can be reduced by up to 87.8% over a state-of-the-art storage scaling approach.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132412768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00010
Chen Chen, Hongao Xu, Wei Wang, Baochun Li, Bo Li, Li Chen, Gong Zhang
Federated learning allows edge devices to collaboratively train a global model by synchronizing their local updates without sharing private data. Yet, with limited network bandwidth at the edge, communication often becomes a severe bottleneck. In this paper, we find that it is unnecessary to always synchronize the full model in the entire training process, because many parameters gradually stabilize prior to the ultimate model convergence, and can thus be excluded from being synchronized at an early stage. This allows us to reduce the communication overhead without compromising the model accuracy. However, challenges are that the local parameters excluded from global synchronization may diverge on different clients, and meanwhile some parameters may stabilize only temporally. To address these challenges, we propose a novel scheme called Adaptive Parameter Freezing (APF), which fixes (freezes) the non-synchronized stable parameters in intermittent periods. Specifically, the freezing periods are tentatively adjusted in an additively-increase and multiplicatively-decrease manner, depending on if the previously-frozen parameters remain stable in subsequent iterations. We implemented APF as a Python module in PyTorch. Our extensive array of experimental results show that APF can reduce data transfer by over 60%.
{"title":"Communication-Efficient Federated Learning with Adaptive Parameter Freezing","authors":"Chen Chen, Hongao Xu, Wei Wang, Baochun Li, Bo Li, Li Chen, Gong Zhang","doi":"10.1109/ICDCS51616.2021.00010","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00010","url":null,"abstract":"Federated learning allows edge devices to collaboratively train a global model by synchronizing their local updates without sharing private data. Yet, with limited network bandwidth at the edge, communication often becomes a severe bottleneck. In this paper, we find that it is unnecessary to always synchronize the full model in the entire training process, because many parameters gradually stabilize prior to the ultimate model convergence, and can thus be excluded from being synchronized at an early stage. This allows us to reduce the communication overhead without compromising the model accuracy. However, challenges are that the local parameters excluded from global synchronization may diverge on different clients, and meanwhile some parameters may stabilize only temporally. To address these challenges, we propose a novel scheme called Adaptive Parameter Freezing (APF), which fixes (freezes) the non-synchronized stable parameters in intermittent periods. Specifically, the freezing periods are tentatively adjusted in an additively-increase and multiplicatively-decrease manner, depending on if the previously-frozen parameters remain stable in subsequent iterations. We implemented APF as a Python module in PyTorch. Our extensive array of experimental results show that APF can reduce data transfer by over 60%.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00054
Wentai Li, Jinyu Gu, Nian Liu, B. Zang
Microkernel OSes provide OS services through mutually-isolated system servers running in different user processes, which brings stronger fault isolation than monolithic OSes. Nevertheless, considering the fault recovery capability of system servers, most existing microkernel OSes usually do no more than restarting a fault server, which will cause a server to lose all its running states and then may affect all the applications relying on it. In this paper, we present a mechanism named TxIPC that can efficiently recover stateful system servers on microkernel OSes. Since a system server provides the service by inter-process communication (IPC), TxIPC makes it fault resilient by handling each IPC in a transaction-like manner. Specifically, if a fault happens in a server (during one IPC handling procedure), TxIPC aborts all the updates made by the IPC and thus recovers the server from that fault. Evaluations show that TxIPC can enable servers to recover from 99.8% (injected) faults with 3%-45 % performance overhead on application benchmarks, which significantly outperforms existing counterparts.
{"title":"Efficiently Recovering Stateful System Components of Multi-server Microkernels","authors":"Wentai Li, Jinyu Gu, Nian Liu, B. Zang","doi":"10.1109/ICDCS51616.2021.00054","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00054","url":null,"abstract":"Microkernel OSes provide OS services through mutually-isolated system servers running in different user processes, which brings stronger fault isolation than monolithic OSes. Nevertheless, considering the fault recovery capability of system servers, most existing microkernel OSes usually do no more than restarting a fault server, which will cause a server to lose all its running states and then may affect all the applications relying on it. In this paper, we present a mechanism named TxIPC that can efficiently recover stateful system servers on microkernel OSes. Since a system server provides the service by inter-process communication (IPC), TxIPC makes it fault resilient by handling each IPC in a transaction-like manner. Specifically, if a fault happens in a server (during one IPC handling procedure), TxIPC aborts all the updates made by the IPC and thus recovers the server from that fault. Evaluations show that TxIPC can enable servers to recover from 99.8% (injected) faults with 3%-45 % performance overhead on application benchmarks, which significantly outperforms existing counterparts.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125248585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00080
Ankur Sarker, Haiying Shen, Tanmoy Sen
The current autonomous vehicles are equipped with onboard deep neural network (DNN) models to process the data from different sensor and communication units. In the connected autonomous vehicle (CAV) scenario, each vehicle receives time-series driving signals (e.g., speed, brake status) from nearby vehicles through the wireless communication technologies. In the CAV scenario, several black-box adversarial attacks have been proposed, in which an attacker deliberately sends false driving signals to its nearby vehicle to fool its onboard DNN model and cause unwanted traffic incidents. However, the previously proposed black-box adversarial attack can be easily detected. To handle this problem, in this paper, we propose a Suspicion-free Boundary Black-box Adversarial (SBBA) attack, where the attacker utilizes the DNN model's output to design the adversarial perturbation. First, we formulate the attack design problem as a goal satisfying optimization problem with constraints so that the proposed attack will not be easily detectable by detection methods. Second, we solve the proposed optimization problem using the Bayesian optimization method. In our Bayesian optimization framework, we use the Gaussian process to model the posterior distribution of the DNN model, and we use the knowledge gradient function to choose the next sample point. We devise a gradient estimation technique for the knowledge gradient method to reduce the solution searching time. Finally, we conduct extensive experimental evaluations using two real driving datasets. The experimental results show that SBBA outperforms the previous adversarial attacks by 56% higher success rate under detection methods, 238% less time to launch the attacks, and 76% less perturbation (to avoid being detected), and 257% fewer queries (to the DNN model to verify the attack success).
{"title":"A Suspicion-Free Black-box Adversarial Attack for Deep Driving Maneuver Classification Models","authors":"Ankur Sarker, Haiying Shen, Tanmoy Sen","doi":"10.1109/ICDCS51616.2021.00080","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00080","url":null,"abstract":"The current autonomous vehicles are equipped with onboard deep neural network (DNN) models to process the data from different sensor and communication units. In the connected autonomous vehicle (CAV) scenario, each vehicle receives time-series driving signals (e.g., speed, brake status) from nearby vehicles through the wireless communication technologies. In the CAV scenario, several black-box adversarial attacks have been proposed, in which an attacker deliberately sends false driving signals to its nearby vehicle to fool its onboard DNN model and cause unwanted traffic incidents. However, the previously proposed black-box adversarial attack can be easily detected. To handle this problem, in this paper, we propose a Suspicion-free Boundary Black-box Adversarial (SBBA) attack, where the attacker utilizes the DNN model's output to design the adversarial perturbation. First, we formulate the attack design problem as a goal satisfying optimization problem with constraints so that the proposed attack will not be easily detectable by detection methods. Second, we solve the proposed optimization problem using the Bayesian optimization method. In our Bayesian optimization framework, we use the Gaussian process to model the posterior distribution of the DNN model, and we use the knowledge gradient function to choose the next sample point. We devise a gradient estimation technique for the knowledge gradient method to reduce the solution searching time. Finally, we conduct extensive experimental evaluations using two real driving datasets. The experimental results show that SBBA outperforms the previous adversarial attacks by 56% higher success rate under detection methods, 238% less time to launch the attacks, and 76% less perturbation (to avoid being detected), and 257% fewer queries (to the DNN model to verify the attack success).","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125731240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00079
Imtiaz Karim, Syed Rafiul Hussain, E. Bertino
Cellular protocol implementations must comply with the specifications, and the security and privacy requirements. These implementations, however, often deviate from the security and privacy requirements due to under specifications in cellular standards, inherent protocol complexities, and design flaws inducing logical vulnerabilities. Detecting such logical vulnerabilities in the complex and stateful 4G LTE protocol is challenging due to operational dependencies on internal-states, and intertwined complex protocol interactions among multiple participants. In this paper, we address these challenges and develop ProChecker which- (1) extracts a precise semantic model as a finite-state machine of the implementation by combining dynamic testing with static instrumentation, and (2) verifies the properties against the extracted model by combining a symbolic model checker and a cryptographic protocol verifier. We demonstrate the effectiveness of ProChecker by evaluating it on a closed-source and two of the most popular open-source 4G LTE control-plane protocol implementations with 62 properties. ProChecker unveiled 3 new protocol-specific logical attacks, 6 implementation issues, and detected 14 prior attacks. The impact of the attacks range from denial-of-service, broken integrity, encryption, and replay protection to privacy leakage.
{"title":"ProChecker: An Automated Security and Privacy Analysis Framework for 4G LTE Protocol Implementations","authors":"Imtiaz Karim, Syed Rafiul Hussain, E. Bertino","doi":"10.1109/ICDCS51616.2021.00079","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00079","url":null,"abstract":"Cellular protocol implementations must comply with the specifications, and the security and privacy requirements. These implementations, however, often deviate from the security and privacy requirements due to under specifications in cellular standards, inherent protocol complexities, and design flaws inducing logical vulnerabilities. Detecting such logical vulnerabilities in the complex and stateful 4G LTE protocol is challenging due to operational dependencies on internal-states, and intertwined complex protocol interactions among multiple participants. In this paper, we address these challenges and develop ProChecker which- (1) extracts a precise semantic model as a finite-state machine of the implementation by combining dynamic testing with static instrumentation, and (2) verifies the properties against the extracted model by combining a symbolic model checker and a cryptographic protocol verifier. We demonstrate the effectiveness of ProChecker by evaluating it on a closed-source and two of the most popular open-source 4G LTE control-plane protocol implementations with 62 properties. ProChecker unveiled 3 new protocol-specific logical attacks, 6 implementation issues, and detected 14 prior attacks. The impact of the attacks range from denial-of-service, broken integrity, encryption, and replay protection to privacy leakage.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128374678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1109/ICDCS51616.2021.00127
Haya Shulman, Hervais Simo
We develop WallGuard for helping users in online social networks (OSNs) avoid regrettable posts and disclosure of sensitive information. Using WallGuard the users can control their posts and can (i) detect inappropriate, regrettable messages before they are posted, as well as (ii) identify already posted messages that could negatively impact user's reputation and life. WallGuard is based on deep learning architectures and NLP based methods. To evaluate the effectiveness of WallGuard, we developed a semi-supervised self-training methodology, which we use to create a new, large-scale corpus for regret detection with 4,7 million OSN messages. The corpus is generated by incrementally labelling messages from large OSN platforms relying on human-labelled and machine-labelled messages. Training Facebook's FastText word embeddings and Word2vec embeddings on our corpus, we created domain specific word embeddings, we referred to as regret embeddings. Our approach allows us to extract features that are discriminative/intrinsic for regrettable disclosures. Leveraging both regret embeddings and the new corpus, we successfully train and evaluate five new multi-label deep-learning based models for automatically classifying regrettable posts. Our evaluation of the proposed models demonstrate that we can detect messages with regrettable topics, achieving up to 0,975 weighted AUC, 82,2% precision and 74,6% recall. WallGuard is free and open-source.
{"title":"Poster: WallGuard - A Deep Learning Approach for Avoiding Regrettable Posts in Social Media","authors":"Haya Shulman, Hervais Simo","doi":"10.1109/ICDCS51616.2021.00127","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00127","url":null,"abstract":"We develop WallGuard for helping users in online social networks (OSNs) avoid regrettable posts and disclosure of sensitive information. Using WallGuard the users can control their posts and can (i) detect inappropriate, regrettable messages before they are posted, as well as (ii) identify already posted messages that could negatively impact user's reputation and life. WallGuard is based on deep learning architectures and NLP based methods. To evaluate the effectiveness of WallGuard, we developed a semi-supervised self-training methodology, which we use to create a new, large-scale corpus for regret detection with 4,7 million OSN messages. The corpus is generated by incrementally labelling messages from large OSN platforms relying on human-labelled and machine-labelled messages. Training Facebook's FastText word embeddings and Word2vec embeddings on our corpus, we created domain specific word embeddings, we referred to as regret embeddings. Our approach allows us to extract features that are discriminative/intrinsic for regrettable disclosures. Leveraging both regret embeddings and the new corpus, we successfully train and evaluate five new multi-label deep-learning based models for automatically classifying regrettable posts. Our evaluation of the proposed models demonstrate that we can detect messages with regrettable topics, achieving up to 0,975 weighted AUC, 82,2% precision and 74,6% recall. WallGuard is free and open-source.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115057859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-25DOI: 10.1109/ICDCS51616.2021.00111
Qian Ren, Han Liu, Yue Li, Hong Lei
In recent years, as blockchain adoption has been expanding across a wide range of domains, e.g., digital asset, supply chain finance, etc., the confidentiality of smart contracts is now a fundamental demand for practical applications. However, while new privacy protection techniques keep coming out, how existing ones can best fit development settings is little studied. Suffering from limited architectural support in terms of programming interfaces, state-of-the-art solutions can hardly reach general developers. In this paper, we proposed the CLOAK framework for developing confidential smart contracts. The key capability of Cloak is allowing developers to implement and deploy practical solutions to multi-party transaction (MPT) problems, i.e., transact with secret inputs and states owned by different parties by simply specifying it. To this end, CLOAK introduced a domain-specific annotation language for declaring privacy specifications and further automatically generating confidential smart contracts to be deployed with trusted execution environment (TEE) on blockchain. In our evaluation on both simple and real-world applications, developers managed to deploy business services on blockchain in a concise manner by only developing CLOAK smart contracts whose size is less than 30% of the deployed ones.
{"title":"Demo: Cloak: A Framework For Development of Confidential Blockchain Smart Contracts","authors":"Qian Ren, Han Liu, Yue Li, Hong Lei","doi":"10.1109/ICDCS51616.2021.00111","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00111","url":null,"abstract":"In recent years, as blockchain adoption has been expanding across a wide range of domains, e.g., digital asset, supply chain finance, etc., the confidentiality of smart contracts is now a fundamental demand for practical applications. However, while new privacy protection techniques keep coming out, how existing ones can best fit development settings is little studied. Suffering from limited architectural support in terms of programming interfaces, state-of-the-art solutions can hardly reach general developers. In this paper, we proposed the CLOAK framework for developing confidential smart contracts. The key capability of Cloak is allowing developers to implement and deploy practical solutions to multi-party transaction (MPT) problems, i.e., transact with secret inputs and states owned by different parties by simply specifying it. To this end, CLOAK introduced a domain-specific annotation language for declaring privacy specifications and further automatically generating confidential smart contracts to be deployed with trusted execution environment (TEE) on blockchain. In our evaluation on both simple and real-world applications, developers managed to deploy business services on blockchain in a concise manner by only developing CLOAK smart contracts whose size is less than 30% of the deployed ones.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122348676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-20DOI: 10.1109/ICDCS51616.2021.00056
Oliver Hope, Eiko Yoneki
We explore the feasibility of combining Graph Neural Network-based policy architectures with Deep Reinforcement Learning as an approach to problems in systems. This fits particularly well with operations on networks, which naturally take the form of graphs. As a case study, we take the idea of data-driven routing in intradomain traffic engineering, whereby the routing of data in a network can be managed taking into account the data itself. The particular subproblem which we examine is minimising link congestion in networks using knowledge of historic traffic flows. We show through experiments that an approach using Graph Neural Networks (GNNs) performs at least as well as previous work using Multilayer Perceptron architectures. GNNs have the added benefit that they allow for the generalisation of trained agents to different network topologies with no extra work. Furthermore, we believe that this technique is applicable to a far wider selection of problems in systems research.
{"title":"GDDR: GNN-based Data-Driven Routing","authors":"Oliver Hope, Eiko Yoneki","doi":"10.1109/ICDCS51616.2021.00056","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00056","url":null,"abstract":"We explore the feasibility of combining Graph Neural Network-based policy architectures with Deep Reinforcement Learning as an approach to problems in systems. This fits particularly well with operations on networks, which naturally take the form of graphs. As a case study, we take the idea of data-driven routing in intradomain traffic engineering, whereby the routing of data in a network can be managed taking into account the data itself. The particular subproblem which we examine is minimising link congestion in networks using knowledge of historic traffic flows. We show through experiments that an approach using Graph Neural Networks (GNNs) performs at least as well as previous work using Multilayer Perceptron architectures. GNNs have the added benefit that they allow for the generalisation of trained agents to different network topologies with no extra work. Furthermore, we believe that this technique is applicable to a far wider selection of problems in systems research.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125946637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-20DOI: 10.1109/ICDCS51616.2021.00049
Danny Hendler, A. Khattabi, A. Milani, Corentin Travers
Relaxing the sequential specification of shared objects has been proposed as a promising approach to obtain implementations with better complexity. In this paper, we study the step complexity of relaxed variants of two common shared objects: max registers and counters. In particular, we consider the $k$-multiplicative-accurate max register and the k-multiplicative-accurate counter, where read operations are allowed to err by a multiplicative factor of $k$ (for some $kin mathbb{N}$). More accurately, reads are allowed to return an approximate value $x$ of the maximum value $v$ previously written to the max register, or of the number $v$ of increments previously applied to the counter, respectively, such that $v/kleq xleq v. k$. We provide upper and lower bounds on the complexity of implementing these objects in a wait-free manner in the shared memory model.
放宽共享对象的顺序规范被认为是获得更好的复杂性实现的一种有前途的方法。本文研究了两种常见共享对象:最大寄存器和计数器的松弛变量的步长复杂度。特别地,我们考虑$k$ -乘法精度最大寄存器和k-乘法精度计数器,其中允许读取操作出错的乘法因子为$k$(对于某些$kin mathbb{N}$)。更准确地说,允许读取返回先前写入max寄存器的最大值$v$的近似值$x$,或者先前分别应用于计数器的增量数$v$的近似值,例如$v/kleq xleq v. k$。我们提供了在共享内存模型中以无等待方式实现这些对象的复杂度的上限和下限。
{"title":"Upper and Lower Bounds for Deterministic Approximate Objects","authors":"Danny Hendler, A. Khattabi, A. Milani, Corentin Travers","doi":"10.1109/ICDCS51616.2021.00049","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00049","url":null,"abstract":"Relaxing the sequential specification of shared objects has been proposed as a promising approach to obtain implementations with better complexity. In this paper, we study the step complexity of relaxed variants of two common shared objects: max registers and counters. In particular, we consider the $k$-multiplicative-accurate max register and the k-multiplicative-accurate counter, where read operations are allowed to err by a multiplicative factor of $k$ (for some $kin mathbb{N}$). More accurately, reads are allowed to return an approximate value $x$ of the maximum value $v$ previously written to the max register, or of the number $v$ of increments previously applied to the counter, respectively, such that $v/kleq xleq v. k$. We provide upper and lower bounds on the complexity of implementing these objects in a wait-free manner in the shared memory model.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-16DOI: 10.1109/ICDCS51616.2021.00057
Shijian Li, Oren Mangoubi, Lijie Xu, Tian Guo
Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol. For example, while Bulk Synchronous Parallel (BSP) often achieves better converged accuracy, the corresponding training throughput can be negatively impacted by stragglers. In contrast, Asynchronous Parallel (ASP) can have higher throughput, but its convergence and accuracy can be impacted by stale gradients. To improve the performance of synchronization protocol, recent work often focuses on designing new protocols with a heavy reliance on hard-to-tune hyper-parameters. In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP, i.e., reducing training time while simultaneously maintaining the converged accuracy. Based on extensive empirical profiling, we devise a collection of adaptive policies that determine how and when to switch between synchronization protocols. Our policies include both offline ones that target recurring jobs and online ones for handling transient stragglers. We implement the proposed policies in a prototype system, called Sync-Switch, on top of TensorFlow, and evaluate the training performance with popular deep learning models and datasets. Our experiments show that Sync-Switch can achieve ASP level training speedup while maintaining similar converged accuracy when comparing to BSP. Moreover, Sync-Switch's elastic-based policy can adequately mitigate the impact from transient stragglers.
{"title":"Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning","authors":"Shijian Li, Oren Mangoubi, Lijie Xu, Tian Guo","doi":"10.1109/ICDCS51616.2021.00057","DOIUrl":"https://doi.org/10.1109/ICDCS51616.2021.00057","url":null,"abstract":"Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol. For example, while Bulk Synchronous Parallel (BSP) often achieves better converged accuracy, the corresponding training throughput can be negatively impacted by stragglers. In contrast, Asynchronous Parallel (ASP) can have higher throughput, but its convergence and accuracy can be impacted by stale gradients. To improve the performance of synchronization protocol, recent work often focuses on designing new protocols with a heavy reliance on hard-to-tune hyper-parameters. In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP, i.e., reducing training time while simultaneously maintaining the converged accuracy. Based on extensive empirical profiling, we devise a collection of adaptive policies that determine how and when to switch between synchronization protocols. Our policies include both offline ones that target recurring jobs and online ones for handling transient stragglers. We implement the proposed policies in a prototype system, called Sync-Switch, on top of TensorFlow, and evaluate the training performance with popular deep learning models and datasets. Our experiments show that Sync-Switch can achieve ASP level training speedup while maintaining similar converged accuracy when comparing to BSP. Moreover, Sync-Switch's elastic-based policy can adequately mitigate the impact from transient stragglers.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129855336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}