The Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented at the ACM SIGMETRICS 2023 conference on June 19-23, 2023, in Orlando, Florida, USA. These papers have been selected during the fall submission round by the 91 members of the ACM SIGMETRICS 2023 program committee via a rigorous review process. Each paper was conditionally accepted (and shepherded), allowed a "one-shot" revision (to be resubmitted to one of the subsequent three SIGMETRICS deadlines), or rejected (with re-submission allowed after a year). For this issue, which represents the fall deadline, POMACS is publishing 26 papers out of 119 submissions. All submissions received at least 3 reviews and borderline cases were extensively discussed during the online program committee meeting. Based on the indicated track(s), roughly 21% of the submissions were in the Theory track, 40% were in the Measurement & Applied Modeling track, 29% were in the Systems track, and 39% were in the Learning track. Many individuals contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their best work to SIGMETRICS/POMACS. Second, we would like to thank the program committee members who provided constructive feedback in their reviews to authors and participated in the online discussions and program committee meeting. We also thank the several external reviewers who provided their expert opinion on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Giuliano Casale, and to past program committee Chairs, Niklas Carlsson, Edith Cohen, and Philippe Robert, who provided a wealth of information and guidance. Finally, we are grateful to the Organization Committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS 2023.
{"title":"POMACS V7, N1, March 2023 Editorial","authors":"K. Avrachenkov, P. Gill, B. Urgaonkar","doi":"10.1145/3579311","DOIUrl":"https://doi.org/10.1145/3579311","url":null,"abstract":"The Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented at the ACM SIGMETRICS 2023 conference on June 19-23, 2023, in Orlando, Florida, USA. These papers have been selected during the fall submission round by the 91 members of the ACM SIGMETRICS 2023 program committee via a rigorous review process. Each paper was conditionally accepted (and shepherded), allowed a \"one-shot\" revision (to be resubmitted to one of the subsequent three SIGMETRICS deadlines), or rejected (with re-submission allowed after a year). For this issue, which represents the fall deadline, POMACS is publishing 26 papers out of 119 submissions. All submissions received at least 3 reviews and borderline cases were extensively discussed during the online program committee meeting. Based on the indicated track(s), roughly 21% of the submissions were in the Theory track, 40% were in the Measurement & Applied Modeling track, 29% were in the Systems track, and 39% were in the Learning track. Many individuals contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their best work to SIGMETRICS/POMACS. Second, we would like to thank the program committee members who provided constructive feedback in their reviews to authors and participated in the online discussions and program committee meeting. We also thank the several external reviewers who provided their expert opinion on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Giuliano Casale, and to past program committee Chairs, Niklas Carlsson, Edith Cohen, and Philippe Robert, who provided a wealth of information and guidance. Finally, we are grateful to the Organization Committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS 2023.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125601937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).
{"title":"SLITS: Sparsity-Lightened Intelligent Thread Scheduling","authors":"Wangkai Jin, Xiangjun Peng","doi":"10.1145/3579436","DOIUrl":"https://doi.org/10.1145/3579436","url":null,"abstract":"A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125569879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe the results of a large-scale study of third-party dependencies around the world based on regional top-500 popular websites accessed from vantage points in 50 countries, together covering all inhabited continents. This broad perspective shows that dependencies on a third-party DNS, CDN or CA provider vary widely around the world, ranging from 19% to as much as 76% of websites, across all countries. The critical dependencies of websites -- where the site depends on a single third-party provider -- are equally spread ranging from 5% to 60% (CDN in Costa Rica and DNS in China, respectively). Interestingly, despite this high variability, our results suggest a highly concentrated market of third-party providers: three providers across all countries serve an average of 92% and Google, by itself, serves an average of 70% of the surveyed websites. Even more concerning, these differences persist a year later with increasing dependencies, particularly for DNS and CDNs. We briefly explore various factors that may help explain the differences and similarities in degrees of third-party dependency across countries, including economic conditions, Internet development, economic trading partners, categories, home countries, and traffic skewness of the country's top-500 sites.
{"title":"Each at its Own Pace: Third-Party Dependency and Centralization Around the World","authors":"Rashna Kumar, Sana Asif, Elise Lee, F. Bustamante","doi":"10.1145/3579437","DOIUrl":"https://doi.org/10.1145/3579437","url":null,"abstract":"We describe the results of a large-scale study of third-party dependencies around the world based on regional top-500 popular websites accessed from vantage points in 50 countries, together covering all inhabited continents. This broad perspective shows that dependencies on a third-party DNS, CDN or CA provider vary widely around the world, ranging from 19% to as much as 76% of websites, across all countries. The critical dependencies of websites -- where the site depends on a single third-party provider -- are equally spread ranging from 5% to 60% (CDN in Costa Rica and DNS in China, respectively). Interestingly, despite this high variability, our results suggest a highly concentrated market of third-party providers: three providers across all countries serve an average of 92% and Google, by itself, serves an average of 70% of the surveyed websites. Even more concerning, these differences persist a year later with increasing dependencies, particularly for DNS and CDNs. We briefly explore various factors that may help explain the differences and similarities in degrees of third-party dependency across countries, including economic conditions, Internet development, economic trading partners, categories, home countries, and traffic skewness of the country's top-500 sites.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129462707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serkut Ayvaşık, Fidan Mehmeti, Edwin Babaians, W. Kellerer
Up-to-date and accurate prediction of Channel State Information (CSI) is of paramount importance in Ultra-Reliable Low-Latency Communications (URLLC), specifically in dynamic environments where unpredictable mobility is inherent. CSI can be meticulously tracked by means of frequent pilot transmissions, which on the downside lead to an increase in metadata (overhead signaling) and latency, which are both detrimental for URLLC. To overcome these issues, in this paper, we take a fundamentally different approach and propose PEACH, a machine learning system which utilizes environmental information with depth images to predict CSI amplitude in beyond 5G systems, without requiring metadata radio resources, such as pilot overheads or any feedback mechanism. PEACH exploits depth images by employing a convolutional neural network to predict the current and the next 100 ms CSI amplitudes. The proposed system is experimentally validated with extensive measurements conducted in an indoor environment, involving two static receivers and two transmitters, one of which is placed on top of a mobile robot. We prove that environmental information can be instrumental towards proactive CSI amplitude acquisition of both static and mobile users on base stations, while providing an almost similar performance as pilot-based methods, and completely avoiding the dependency on feedback and pilot transmission for both downlink and uplink CSI information. Furthermore, compared to demodulation reference signal based traditional pilot estimation in ideal conditions without interference, our experimental results show that PEACH yields the same performance in terms of average bit error rate when channel conditions are poor (using low order modulation), while not being much worse when using higher modulation orders, like 16-QAM or 64-QAM. More importantly, in the realistic cases with interference taken into account, our experiments demonstrate considerable improvements introduced by PEACH in terms of normalized mean square error of CSI amplitude estimation, up to 6 dB, when compared to traditional approaches.
{"title":"PEACH: Proactive and Environment-Aware Channel State Information Prediction with Depth Images","authors":"Serkut Ayvaşık, Fidan Mehmeti, Edwin Babaians, W. Kellerer","doi":"10.1145/3579450","DOIUrl":"https://doi.org/10.1145/3579450","url":null,"abstract":"Up-to-date and accurate prediction of Channel State Information (CSI) is of paramount importance in Ultra-Reliable Low-Latency Communications (URLLC), specifically in dynamic environments where unpredictable mobility is inherent. CSI can be meticulously tracked by means of frequent pilot transmissions, which on the downside lead to an increase in metadata (overhead signaling) and latency, which are both detrimental for URLLC. To overcome these issues, in this paper, we take a fundamentally different approach and propose PEACH, a machine learning system which utilizes environmental information with depth images to predict CSI amplitude in beyond 5G systems, without requiring metadata radio resources, such as pilot overheads or any feedback mechanism. PEACH exploits depth images by employing a convolutional neural network to predict the current and the next 100 ms CSI amplitudes. The proposed system is experimentally validated with extensive measurements conducted in an indoor environment, involving two static receivers and two transmitters, one of which is placed on top of a mobile robot. We prove that environmental information can be instrumental towards proactive CSI amplitude acquisition of both static and mobile users on base stations, while providing an almost similar performance as pilot-based methods, and completely avoiding the dependency on feedback and pilot transmission for both downlink and uplink CSI information. Furthermore, compared to demodulation reference signal based traditional pilot estimation in ideal conditions without interference, our experimental results show that PEACH yields the same performance in terms of average bit error rate when channel conditions are poor (using low order modulation), while not being much worse when using higher modulation orders, like 16-QAM or 64-QAM. More importantly, in the realistic cases with interference taken into account, our experiments demonstrate considerable improvements introduced by PEACH in terms of normalized mean square error of CSI amplitude estimation, up to 6 dB, when compared to traditional approaches.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127532417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.
{"title":"(Private) Kernelized Bandits with Distributed Biased Feedback","authors":"Fengjiao Li, Xingyu Zhou, Bo Ji","doi":"10.1145/3579318","DOIUrl":"https://doi.org/10.1145/3579318","url":null,"abstract":"In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133484410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kailong Wang, Yuxi Ling, Yanjun Zhang, Zhou Yu, Haoyu Wang, Guangdong Bai, B. Ooi, J. Dong
Due to the surging popularity of various cryptocurrencies in recent years, a large number of browser extensions have been developed as portals to access relevant services, such as cryptocurrency exchanges and wallets. This has stimulated a wild growth of cryptocurrency themed malicious extensions that cause heavy financial losses to the users and legitimate service providers. They have shown their capability of evading the stringent vetting processes of the extension stores, highlighting a lack of understanding of this emerging type of malware in our community. In this work, we conduct the first systematic study to identify and characterize cryptocurrency-themed malicious extensions. We monitor seven official and third-party extension distribution venues for 18 months (December 2020 to June 2022) and have collected around 3600 unique cryptocurrency-themed extensions. Leveraging a hybrid analysis, we have identified 186 malicious extensions that belong to five categories. We then characterize those extensions from various perspectives including their distribution channels, life cycles, developers, illicit behaviors, and illegal gains. Our work unveils the status quo of the cryptocurrency-themed malicious extensions and reveals their disguises and programmatic features on which detection techniques can be based. Our work serves as a warning to extension users, and an appeal to extension store operators to enact dedicated countermeasures. To facilitate future research in this area, we release our dataset of the identified malicious extensions and open-source our analyzer.
{"title":"Characterizing Cryptocurrency-themed Malicious Browser Extensions","authors":"Kailong Wang, Yuxi Ling, Yanjun Zhang, Zhou Yu, Haoyu Wang, Guangdong Bai, B. Ooi, J. Dong","doi":"10.1145/3570603","DOIUrl":"https://doi.org/10.1145/3570603","url":null,"abstract":"Due to the surging popularity of various cryptocurrencies in recent years, a large number of browser extensions have been developed as portals to access relevant services, such as cryptocurrency exchanges and wallets. This has stimulated a wild growth of cryptocurrency themed malicious extensions that cause heavy financial losses to the users and legitimate service providers. They have shown their capability of evading the stringent vetting processes of the extension stores, highlighting a lack of understanding of this emerging type of malware in our community. In this work, we conduct the first systematic study to identify and characterize cryptocurrency-themed malicious extensions. We monitor seven official and third-party extension distribution venues for 18 months (December 2020 to June 2022) and have collected around 3600 unique cryptocurrency-themed extensions. Leveraging a hybrid analysis, we have identified 186 malicious extensions that belong to five categories. We then characterize those extensions from various perspectives including their distribution channels, life cycles, developers, illicit behaviors, and illegal gains. Our work unveils the status quo of the cryptocurrency-themed malicious extensions and reveals their disguises and programmatic features on which detection techniques can be based. Our work serves as a warning to extension users, and an appeal to extension store operators to enact dedicated countermeasures. To facilitate future research in this area, we release our dataset of the identified malicious extensions and open-source our analyzer.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126455028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The MinUsageTime Dynamic Bin Packing (DBP) problem aims to minimize the accumulated bin usage time for packing a sequence of items into bins. It is often used to model job dispatching for optimizing the busy time of servers, where the items and bins match the jobs and servers respectively. It is known that the competitiveness of MinUsageTime DBP has tight bounds of Θ(√łog μ ) and Θ(μ) in the clairvoyant and non-clairvoyant settings respectively, where μ is the max/min duration ratio of all items. In practice, the information about the items' durations (i.e., job lengths) obtained via predictions is usually prone to errors. In this paper, we study the MinUsageTime DBP problem with predictions of the items' durations. We find that an existing O(√łog μ )-competitive clairvoyant algorithm, if using predicted durations rather than real durations for packing, does not provide any bounded performance guarantee when the predictions are adversarially bad. We develop a new online algorithm with a competitive ratio of minØ(ε^2 √łog(ε^2 μ) ), O(μ) (where ε is the maximum multiplicative error of prediction among all items), achieving O(√łog μ) consistency (competitiveness under perfect predictions where ε = 1) and O(μ) robustness (competitiveness under terrible predictions), both of which are asymptotically optimal.
{"title":"Dynamic Bin Packing with Predictions","authors":"Mozhengfu Liu, Xueyan Tang","doi":"10.1145/3570605","DOIUrl":"https://doi.org/10.1145/3570605","url":null,"abstract":"The MinUsageTime Dynamic Bin Packing (DBP) problem aims to minimize the accumulated bin usage time for packing a sequence of items into bins. It is often used to model job dispatching for optimizing the busy time of servers, where the items and bins match the jobs and servers respectively. It is known that the competitiveness of MinUsageTime DBP has tight bounds of Θ(√łog μ ) and Θ(μ) in the clairvoyant and non-clairvoyant settings respectively, where μ is the max/min duration ratio of all items. In practice, the information about the items' durations (i.e., job lengths) obtained via predictions is usually prone to errors. In this paper, we study the MinUsageTime DBP problem with predictions of the items' durations. We find that an existing O(√łog μ )-competitive clairvoyant algorithm, if using predicted durations rather than real durations for packing, does not provide any bounded performance guarantee when the predictions are adversarially bad. We develop a new online algorithm with a competitive ratio of minØ(ε^2 √łog(ε^2 μ) ), O(μ) (where ε is the maximum multiplicative error of prediction among all items), achieving O(√łog μ) consistency (competitiveness under perfect predictions where ε = 1) and O(μ) robustness (competitiveness under terrible predictions), both of which are asymptotically optimal.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132932759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of balancing the load among servers in dense racks for microsecond-scale workloads. To balance the load in such settings tens of millions of scheduling decisions have to be made per second. Achieving this throughput while providing microsecond-scale latency and high availability is extremely challenging. To address this challenge, we design a fully decentralized load-balancing framework. In this framework, servers collectively balance the load in the system. We model the interactions among servers as a cooperative stochastic game. To find the game's parametric Nash equilibrium, we design and implement a decentralized algorithm based on multi-agent-learning theory. We empirically show that our proposed algorithm is adaptive and scalable while outperforming state-of-the art alternatives. In homogeneous settings, Malcolm performs as well as the best alternative among other baselines. In heterogeneous settings, compared to other baselines, for lower loads, Malcolm improves tail latency by up to a factor of four. And for the same tail latency, Malcolm achieves up to 60% more throughput compared to the best alternative among other baselines.
{"title":"Malcolm: Multi-agent Learning for Cooperative Load Management at Rack Scale","authors":"Ali Hossein Abbasi Abyaneh, Maizi Liao, S. Zahedi","doi":"10.1145/3570611","DOIUrl":"https://doi.org/10.1145/3570611","url":null,"abstract":"We consider the problem of balancing the load among servers in dense racks for microsecond-scale workloads. To balance the load in such settings tens of millions of scheduling decisions have to be made per second. Achieving this throughput while providing microsecond-scale latency and high availability is extremely challenging. To address this challenge, we design a fully decentralized load-balancing framework. In this framework, servers collectively balance the load in the system. We model the interactions among servers as a cooperative stochastic game. To find the game's parametric Nash equilibrium, we design and implement a decentralized algorithm based on multi-agent-learning theory. We empirically show that our proposed algorithm is adaptive and scalable while outperforming state-of-the art alternatives. In homogeneous settings, Malcolm performs as well as the best alternative among other baselines. In heterogeneous settings, compared to other baselines, for lower loads, Malcolm improves tail latency by up to a factor of four. And for the same tail latency, Malcolm achieves up to 60% more throughput compared to the best alternative among other baselines.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele De Sensi, T. De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler
Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise HPC systems offer similar computational resources, their network architecture and performance may differ significantly. For example, these systems use fundamentally different network transport and routing protocols, which may introduce network noise that can eventually limit the application scaling. This work analyzes network performance, scalability, and cost of running HPC workloads on cloud systems. First, we consider latency, bandwidth, and collective communication patterns in detailed small-scale measurements, and then we simulate network performance at a larger scale. We validate our approach on four popular cloud providers and three on-premise HPC systems, showing that network (and also OS) noise can significantly impact performance and cost both at small and large scale.
{"title":"Noise in the Clouds","authors":"Daniele De Sensi, T. De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler","doi":"10.1145/3570609","DOIUrl":"https://doi.org/10.1145/3570609","url":null,"abstract":"Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise HPC systems offer similar computational resources, their network architecture and performance may differ significantly. For example, these systems use fundamentally different network transport and routing protocols, which may introduce network noise that can eventually limit the application scaling. This work analyzes network performance, scalability, and cost of running HPC workloads on cloud systems. First, we consider latency, bandwidth, and collective communication patterns in detailed small-scale measurements, and then we simulate network performance at a larger scale. We validate our approach on four popular cloud providers and three on-premise HPC systems, showing that network (and also OS) noise can significantly impact performance and cost both at small and large scale.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher un at server sm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.
{"title":"Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities","authors":"Xinzhe Fu, E. Modiano","doi":"10.1145/3570619","DOIUrl":"https://doi.org/10.1145/3570619","url":null,"abstract":"We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher un at server sm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126615133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}