ACM Transactions on Modeling and Performance Evaluation of Computing Systems最新文献_第3页

Malicious Node Identification in Coded Distributed Storage Systems under Pollution Attacks 污染攻击下编码分布式存储系统的恶意节点识别

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-09-30 DOI: 10.1145/3491062

R. Gaeta, Marco Grangetto

In coding-based distributed storage systems (DSSs), a set of storage nodes (SNs) hold coded fragments of a data unit that collectively allow one to recover the original information. It is well known that data modification (a.k.a. pollution attack) is the Achilles’ heel of such coding systems; indeed, intentional modification of a single coded fragment has the potential to prevent the reconstruction of the original information because of error propagation induced by the decoding algorithm. The challenge we take in this work is to devise an algorithm to identify polluted coded fragments within the set encoding a data unit and to characterize its performance. To this end, we provide the following contributions: (i) We devise MIND (Malicious node IdeNtification in DSS), an algorithm that is general with respect to the encoding mechanism chosen for the DSS, it is able to cope with a heterogeneous allocation of coded fragments to SNs, and it is effective in successfully identifying polluted coded fragments in a low-redundancy scenario; (ii) We formally prove both MIND termination and correctness; (iii) We derive an accurate analytical characterization of MIND performance (hit probability and complexity); (iv) We develop a C++ prototype that implements MIND to validate the performance predictions of the analytical model. Finally, to show applicability of our work, we define performance and robustness metrics for an allocation of coded fragments to SNs and we apply the results of the analytical characterization of MIND performance to select coded fragments allocations yielding robustness to collusion as well as the highest probability to identify actual attackers.

在基于编码的分布式存储系统(DSSs)中，一组存储节点(SNs)保存数据单元的编码片段，这些片段共同允许人们恢复原始信息。众所周知，数据修改(又称污染攻击)是这类编码系统的致命弱点;事实上，对单个编码片段的有意修改有可能由于解码算法引起的错误传播而阻止原始信息的重建。我们在这项工作中面临的挑战是设计一种算法来识别编码数据单元的集合中受污染的编码片段并表征其性能。为此，我们提供了以下贡献:(i)我们设计了MIND(恶意节点识别)，这是一种针对DSS选择的编码机制的通用算法，它能够处理编码片段到SNs的异构分配，并且在低冗余场景下有效地成功识别受污染的编码片段;(ii)我们正式证明MIND终止和正确性;(iii)我们得出了MIND性能的准确分析特征(命中概率和复杂性);(iv)我们开发了一个实现MIND的c++原型来验证分析模型的性能预测。最后，为了展示我们工作的适用性，我们定义了编码片段分配到SNs的性能和鲁棒性指标，并应用MIND性能分析表征的结果来选择编码片段分配，从而产生对共谋的鲁棒性以及识别实际攻击者的最高概率。

{"title":"Malicious Node Identification in Coded Distributed Storage Systems under Pollution Attacks","authors":"R. Gaeta, Marco Grangetto","doi":"10.1145/3491062","DOIUrl":"https://doi.org/10.1145/3491062","url":null,"abstract":"In coding-based distributed storage systems (DSSs), a set of storage nodes (SNs) hold coded fragments of a data unit that collectively allow one to recover the original information. It is well known that data modification (a.k.a. pollution attack) is the Achilles’ heel of such coding systems; indeed, intentional modification of a single coded fragment has the potential to prevent the reconstruction of the original information because of error propagation induced by the decoding algorithm. The challenge we take in this work is to devise an algorithm to identify polluted coded fragments within the set encoding a data unit and to characterize its performance.\u0000 To this end, we provide the following contributions: (i) We devise MIND (Malicious node IdeNtification in DSS), an algorithm that is general with respect to the encoding mechanism chosen for the DSS, it is able to cope with a heterogeneous allocation of coded fragments to SNs, and it is effective in successfully identifying polluted coded fragments in a low-redundancy scenario; (ii) We formally prove both MIND termination and correctness; (iii) We derive an accurate analytical characterization of MIND performance (hit probability and complexity); (iv) We develop a C++ prototype that implements MIND to validate the performance predictions of the analytical model.\u0000 Finally, to show applicability of our work, we define performance and robustness metrics for an allocation of coded fragments to SNs and we apply the results of the analytical characterization of MIND performance to select coded fragments allocations yielding robustness to collusion as well as the highest probability to identify actual attackers.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"28 1","pages":"12:1-12:27"},"PeriodicalIF":0.6,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78185062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Combinatorial Reliability Analysis of Generic Service Function Chains in Data Center Networks 数据中心网络中通用业务功能链的组合可靠性分析

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-09-30 DOI: 10.1145/3477046

Anna Engelmann, A. Jukan

In data center networks, the reliability of Service Function Chain (SFC)—an end-to-end service presented by a chain of virtual network functions (VNFs)—is a complex and specific function of placement, configuration, and application requirements, both in hardware and software. Existing approaches to reliability analysis do not jointly consider multiple features of system components, including, (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency. To this end, we develop a novel analysis of service reliability of the so-called generic SFC, consisting of n = k + r sub-SFCs, whereby k ≥ 1 and r ≥ 0 are the numbers of arbitrary placed primary and backup (redundant) sub-SFCs, respectively. Our analysis is based on combinatorics and a reduced binomial theorem—resulting in a simple approach, which, however, can be utilized to analyze rather complex SFC configurations. The analysis is practically applicable to various VNF placement strategies in arbitrary data center configurations, and topologies and can be effectively used for evaluation and optimization of reliable SFC placements.

在数据中心网络中，业务功能链(SFC)的可靠性是一个复杂而具体的功能，涉及硬件和软件的放置、配置和应用需求。业务功能链是由一系列虚拟网络功能链(VNFs)提供的端到端服务。现有的可靠性分析方法没有联合考虑系统组件的多种特征，包括:(i)异质性，(ii)不连接性，(iii)共享性，(iv)冗余性和(v)故障相互依赖性。为此，我们对所谓的通用SFC的服务可靠性进行了新的分析，该分析由n = k + r个子SFC组成，其中k≥1和r≥0分别是任意放置的主和备份(冗余)子SFC的数量。我们的分析基于组合学和简化二项式定理——这产生了一种简单的方法，然而，它可以用来分析相当复杂的SFC配置。该分析实际适用于任意数据中心配置和拓扑中的各种VNF放置策略，可以有效地用于评估和优化可靠的SFC放置。

{"title":"A Combinatorial Reliability Analysis of Generic Service Function Chains in Data Center Networks","authors":"Anna Engelmann, A. Jukan","doi":"10.1145/3477046","DOIUrl":"https://doi.org/10.1145/3477046","url":null,"abstract":"\u0000 In data center networks, the reliability of Service Function Chain (SFC)—an end-to-end service presented by a chain of virtual network functions (VNFs)—is a complex and specific function of placement, configuration, and application requirements, both in hardware and software. Existing approaches to reliability analysis do not jointly consider multiple features of system components, including, (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency. To this end, we develop a novel analysis of service reliability of the so-called\u0000 generic SFC,\u0000 consisting of\u0000 n\u0000 =\u0000 k\u0000 +\u0000 r\u0000 sub-SFCs, whereby\u0000 k\u0000 ≥ 1 and\u0000 r\u0000 ≥ 0 are the numbers of arbitrary placed primary and backup (redundant) sub-SFCs, respectively. Our analysis is based on combinatorics and a reduced binomial theorem—resulting in a simple approach, which, however, can be utilized to analyze rather complex SFC configurations. The analysis is practically applicable to various VNF placement strategies in arbitrary data center configurations, and topologies and can be effectively used for evaluation and optimization of reliable SFC placements.\u0000","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"35 4 1","pages":"9:1-9:24"},"PeriodicalIF":0.6,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77233559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Performance Health Index for Complex Cyber Infrastructures 复杂网络基础设施的性能运行状况指数

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-09-03 DOI: 10.1145/3538646

Sanjeev Sondur, K. Kant

Most IT systems depend on a set of configuration variables (CVs), expressed as a name/value pair that collectively defines the resource allocation for the system. While the ill effects of misconfiguration or improper resource allocation are well-known, there are no effective a priori metrics to quantify the impact of the configuration on the desired system attributes such as performance, availability, etc. In this paper, we propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system. We show how CHI, which is defined as a configuration scoring system, can take advantage of the domain knowledge and the available (but rather limited) performance data to produce important insights into the configuration settings. We compare the CHI with both well-advertised segmented non-linear models and state-of-the-art data-driven models, and show that the CHI not only consistently provides better results but also avoids the dangers of a pure data drive approach which may predict incorrect behavior or eliminate some essential configuration variables from consideration.

大多数IT系统依赖于一组配置变量（CV），这些变量表示为名称/值对，共同定义系统的资源分配。虽然错误配置或不正确的资源分配的不良影响是众所周知的，但没有有效的先验度量来量化配置对所需系统属性（如性能、可用性等）的影响。在本文中，我们提出了一个专门针对性能属性的配置健康指数（CHI）框架，以捕捉CV对系统性能方面的影响。我们展示了被定义为配置评分系统的CHI如何利用领域知识和可用（但相当有限）的性能数据，对配置设置产生重要见解。我们将CHI与广为宣传的分段非线性模型和最先进的数据驱动模型进行了比较，并表明CHI不仅始终如一地提供了更好的结果，而且避免了纯数据驱动方法的危险，这种方法可能会预测不正确的行为或从考虑中消除一些重要的配置变量。

{"title":"Performance Health Index for Complex Cyber Infrastructures","authors":"Sanjeev Sondur, K. Kant","doi":"10.1145/3538646","DOIUrl":"https://doi.org/10.1145/3538646","url":null,"abstract":"Most IT systems depend on a set of configuration variables (CVs), expressed as a name/value pair that collectively defines the resource allocation for the system. While the ill effects of misconfiguration or improper resource allocation are well-known, there are no effective a priori metrics to quantify the impact of the configuration on the desired system attributes such as performance, availability, etc. In this paper, we propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system. We show how CHI, which is defined as a configuration scoring system, can take advantage of the domain knowledge and the available (but rather limited) performance data to produce important insights into the configuration settings. We compare the CHI with both well-advertised segmented non-linear models and state-of-the-art data-driven models, and show that the CHI not only consistently provides better results but also avoids the dangers of a pure data drive approach which may predict incorrect behavior or eliminate some essential configuration variables from consideration.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"7 1","pages":"1 - 32"},"PeriodicalIF":0.6,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42536638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Performance Analysis of the IOTA DAG-Based Distributed Ledger 基于IOTA dag的分布式账本性能分析

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-09-01 DOI: 10.7939/R3-W1C1-WT05

Caixiang Fan, Sara Ghaemi, Hamzeh Khazaei, Yuxiang Chen, P. Musílek

Distributed ledgers (DLs) provide many advantages over centralized solutions in Internet of Things projects, including but not limited to improved security, transparency, and fault tolerance. To leverage DLs at scale, their well-known limitation (i.e., performance) should be adequately analyzed and addressed. Directed acyclic graph-based DLs have been proposed to tackle the performance and scalability issues by design. The first among them, IOTA, has shown promising signs in addressing the preceding issues. IOTA is an open source DL designed for the Internet of Things. It uses a directed acyclic graph to store transactions on its ledger, to achieve a potentially higher scalability over blockchain-based DLs. However, due to the uncertainty and centralization of the deployed consensus, the current IOTA implementation exposes some performance issues, making it less performant than the initial design. In this article, we first extend an existing simulator to support realistic IOTA simulations and investigate the impact of different design parameters on IOTA’s performance. Then, we propose a layered model to help the users of IOTA determine the optimal waiting time to resend the previously submitted but not yet confirmed transaction. Our findings reveal the impact of the transaction arrival rate, tip selection algorithms, weighted tip selection algorithm randomness, and network delay on the throughput. Using the proposed layered model, we shed some light on the distribution of the confirmed transactions. The distribution is leveraged to calculate the optimal time for resending an unconfirmed transaction to the DL. The performance analysis results can be used by both system designers and users to support their decision making.

分布式账本(dl)在物联网项目中提供了许多优于集中式解决方案的优势，包括但不限于提高安全性、透明度和容错性。为了大规模地利用dl，应该充分分析和处理它们众所周知的限制(即性能)。为了解决性能和可伸缩性问题，提出了基于有向无环图的分布式数据库。其中第一个是IOTA，在解决上述问题方面显示出有希望的迹象。IOTA是一个为物联网设计的开源DL。它使用有向无环图在其分类账上存储交易，以实现比基于区块链的dl更高的可扩展性。然而，由于部署共识的不确定性和集中化，目前的IOTA实现暴露了一些性能问题，使其性能低于最初的设计。在本文中，我们首先扩展了现有的模拟器以支持现实的IOTA模拟，并研究了不同设计参数对IOTA性能的影响。然后，我们提出了一个分层模型，以帮助IOTA用户确定重新发送先前提交但尚未确认的交易的最佳等待时间。我们的研究结果揭示了交易到达率、tip选择算法、加权tip选择算法随机性和网络延迟对吞吐量的影响。使用提出的分层模型，我们对确认交易的分布有了一些了解。利用该分布计算将未经确认的事务重新发送到DL的最佳时间。系统设计人员和用户都可以使用性能分析结果来支持他们的决策。

{"title":"Performance Analysis of the IOTA DAG-Based Distributed Ledger","authors":"Caixiang Fan, Sara Ghaemi, Hamzeh Khazaei, Yuxiang Chen, P. Musílek","doi":"10.7939/R3-W1C1-WT05","DOIUrl":"https://doi.org/10.7939/R3-W1C1-WT05","url":null,"abstract":"Distributed ledgers (DLs) provide many advantages over centralized solutions in Internet of Things projects, including but not limited to improved security, transparency, and fault tolerance. To leverage DLs at scale, their well-known limitation (i.e., performance) should be adequately analyzed and addressed. Directed acyclic graph-based DLs have been proposed to tackle the performance and scalability issues by design. The first among them, IOTA, has shown promising signs in addressing the preceding issues. IOTA is an open source DL designed for the Internet of Things. It uses a directed acyclic graph to store transactions on its ledger, to achieve a potentially higher scalability over blockchain-based DLs. However, due to the uncertainty and centralization of the deployed consensus, the current IOTA implementation exposes some performance issues, making it less performant than the initial design. In this article, we first extend an existing simulator to support realistic IOTA simulations and investigate the impact of different design parameters on IOTA’s performance. Then, we propose a layered model to help the users of IOTA determine the optimal waiting time to resend the previously submitted but not yet confirmed transaction. Our findings reveal the impact of the transaction arrival rate, tip selection algorithms, weighted tip selection algorithm randomness, and network delay on the throughput. Using the proposed layered model, we shed some light on the distribution of the confirmed transactions. The distribution is leveraged to calculate the optimal time for resending an unconfirmed transaction to the DL. The performance analysis results can be used by both system designers and users to support their decision making.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"6 1","pages":"10:1-10:20"},"PeriodicalIF":0.6,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86032559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Mansard Roofline Model: Reinforcing the Accuracy of the Roofs Mansard屋顶线模型:加强屋顶的准确性

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-06-30 DOI: 10.1145/3475866

Diogo Marques, A. Ilic, L. Sousa

Continuous enhancements and diversity in modern multi-core hardware, such as wider and deeper core pipelines and memory subsystems, bring to practice a set of hard-to-solve challenges when modeling their upper-bound capabilities and identifying the main application bottlenecks. Insightful roofline models are widely used for this purpose, but the existing approaches overly abstract the micro-architecture complexity, thus providing unrealistic performance bounds that lead to a misleading characterization of real-world applications. To address this problem, the Mansard Roofline Model (MaRM), proposed in this work, uncovers a minimum set of architectural features that must be considered to provide insightful, but yet accurate and realistic, modeling of performance upper bounds for modern processors. By encapsulating the retirement constraints due to the amount of retirement slots, Reorder-Buffer and Physical Register File sizes, the proposed model accurately models the capabilities of a real platform (average rRMSE of 5.4%) and characterizes 12 application kernels from standard benchmark suites. By following a herein proposed MaRM interpretation methodology and guidelines, speed-ups of up to 5× are obtained when optimizing real-world bioinformatic application, as well as a super-linear speedup of 18.5× when parallelized.

现代多核硬件的不断增强和多样性，如更宽、更深的核心管道和内存子系统，在建模其上限功能和确定主要应用程序瓶颈时，带来了一系列难以解决的挑战。深入的屋顶线模型被广泛用于此目的，但现有的方法过于抽象了微观架构的复杂性，从而提供了不切实际的性能界限，导致对现实世界应用程序的误导性描述。为了解决这个问题，本工作中提出的Mansard屋顶线模型（MaRM）揭示了一组最小的体系结构特征，这些特征必须被考虑为现代处理器的性能上限提供深刻但准确和现实的建模。通过封装由于退役时隙数量、重新排序缓冲区和物理寄存器文件大小而产生的退役约束，所提出的模型准确地对真实平台的能力进行了建模（平均rRMSE为5.4%），并表征了标准基准套件中的12个应用程序内核。通过遵循本文提出的MaRM解释方法和指南，在优化真实世界的生物信息学应用时可获得高达5倍的速度，在并行化时可获得18.5倍的超线性速度。

{"title":"Mansard Roofline Model: Reinforcing the Accuracy of the Roofs","authors":"Diogo Marques, A. Ilic, L. Sousa","doi":"10.1145/3475866","DOIUrl":"https://doi.org/10.1145/3475866","url":null,"abstract":"Continuous enhancements and diversity in modern multi-core hardware, such as wider and deeper core pipelines and memory subsystems, bring to practice a set of hard-to-solve challenges when modeling their upper-bound capabilities and identifying the main application bottlenecks. Insightful roofline models are widely used for this purpose, but the existing approaches overly abstract the micro-architecture complexity, thus providing unrealistic performance bounds that lead to a misleading characterization of real-world applications. To address this problem, the Mansard Roofline Model (MaRM), proposed in this work, uncovers a minimum set of architectural features that must be considered to provide insightful, but yet accurate and realistic, modeling of performance upper bounds for modern processors. By encapsulating the retirement constraints due to the amount of retirement slots, Reorder-Buffer and Physical Register File sizes, the proposed model accurately models the capabilities of a real platform (average rRMSE of 5.4%) and characterizes 12 application kernels from standard benchmark suites. By following a herein proposed MaRM interpretation methodology and guidelines, speed-ups of up to 5× are obtained when optimizing real-world bioinformatic application, as well as a super-linear speedup of 18.5× when parallelized.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"6 1","pages":"1 - 23"},"PeriodicalIF":0.6,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44571161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Performance Analysis of Work Stealing in Large-scale Multithreaded Computing 大规模多线程计算中工作窃取的性能分析

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-06-30 DOI: 10.1145/3470887

Nikki Sonenberg, Grzegorz Kielanski, B. Van Houdt

Randomized work stealing is used in distributed systems to increase performance and improve resource utilization. In this article, we consider randomized work stealing in a large system of homogeneous processors where parent jobs spawn child jobs that can feasibly be executed in parallel with the parent job. We analyse the performance of two work stealing strategies: one where only child jobs can be transferred across servers and the other where parent jobs are transferred. We define a mean-field model to derive the response time distribution in a large-scale system with Poisson arrivals and exponential parent and child job durations. We prove that the model has a unique fixed point that corresponds to the steady state of a structured Markov chain, allowing us to use matrix analytic methods to compute the unique fixed point. The accuracy of the mean-field model is validated using simulation. Using numerical examples, we illustrate the effect of different probe rates, load, and different child job size distributions on performance with respect to the two stealing strategies, individually, and compared to each other.

分布式系统中使用随机工作窃取来提高性能和提高资源利用率。在本文中，我们考虑了同构处理器的大型系统中的随机工作窃取，其中父作业生成的子作业可以与父作业并行执行。我们分析了两种工作窃取策略的性能：一种是只有子工作可以在服务器之间转移，另一种是父工作转移。我们定义了一个平均场模型来推导具有泊松到达和指数父子工作持续时间的大规模系统中的响应时间分布。我们证明了该模型具有一个唯一的不动点，该不动点对应于结构化马尔可夫链的稳态，使我们能够使用矩阵分析方法来计算该唯一不动点。通过仿真验证了平均场模型的准确性。通过数值示例，我们分别说明了不同的探测率、负载和不同的子任务大小分布对两种盗窃策略性能的影响，并进行了比较。

{"title":"Performance Analysis of Work Stealing in Large-scale Multithreaded Computing","authors":"Nikki Sonenberg, Grzegorz Kielanski, B. Van Houdt","doi":"10.1145/3470887","DOIUrl":"https://doi.org/10.1145/3470887","url":null,"abstract":"Randomized work stealing is used in distributed systems to increase performance and improve resource utilization. In this article, we consider randomized work stealing in a large system of homogeneous processors where parent jobs spawn child jobs that can feasibly be executed in parallel with the parent job. We analyse the performance of two work stealing strategies: one where only child jobs can be transferred across servers and the other where parent jobs are transferred. We define a mean-field model to derive the response time distribution in a large-scale system with Poisson arrivals and exponential parent and child job durations. We prove that the model has a unique fixed point that corresponds to the steady state of a structured Markov chain, allowing us to use matrix analytic methods to compute the unique fixed point. The accuracy of the mean-field model is validated using simulation. Using numerical examples, we illustrate the effect of different probe rates, load, and different child job size distributions on performance with respect to the two stealing strategies, individually, and compared to each other.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"6 1","pages":"1 - 28"},"PeriodicalIF":0.6,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41749830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A New Upper Bound on Cache Hit Probability for Non-Anticipative Caching Policies 非预期缓存策略缓存命中概率的一个新上界

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-06-11 DOI: 10.1145/3547332

Nitish K. Panigrahy, P. Nain, G. Neglia, D. Towsley

Caching systems have long been crucial for improving the performance of a wide variety of network and web-based online applications. In such systems, end-to-end application performance heavily depends on the fraction of objects transferred from the cache, also known as the cache hit probability. Many caching policies have been proposed and implemented to improve the hit probability. In this work, we propose a new method to compute an upper bound on hit probability for all non-anticipative caching policies and for policies that have no knowledge of future requests. Our key insight is to order the objects according to the ratio of their Hazard Rate(HR) function values to their sizes, and place in the cache the objects with the largest ratios till the cache capacity is exhausted. When object request processes are conditionally independent, we prove that this cache allocation based on the HR-to-size ratio rule guarantees the maximum achievable expected number of object hits across all non-anticipative caching policies. Further, the HR ordering rule serves as an upper bound on cache hit probability when object request processes follow either independent delayed renewal process or a Markov modulated Poisson process. We also derive closed form expressions for the upper bound under some specific object request arrival processes. We provide simulation results to validate its correctness and to compare it to the state-of-the-art upper bounds, such as produced by Bélády’s algorithm. We find it to be tighter than state-of-the-art upper bounds for some specific object request arrival processes such as independent renewal, Markov modulated, and shot noise processes.

长期以来，缓存系统对于提高各种网络和基于web的在线应用程序的性能至关重要。在这样的系统中，端到端应用程序的性能在很大程度上取决于从缓存传输的对象的比例，也称为缓存命中概率。为了提高命中概率，已经提出并实现了许多缓存策略。在这项工作中，我们提出了一种新的方法来计算所有非预期缓存策略和不知道未来请求的策略的命中概率上限。我们的关键见解是根据对象的危险率（HR）函数值与其大小的比率对对象进行排序，并将比率最大的对象放入缓存中，直到缓存容量耗尽。当对象请求过程是有条件独立的时，我们证明了这种基于HR与大小比率规则的缓存分配保证了所有非预期缓存策略中对象命中的最大可实现预期数量。此外，当对象请求过程遵循独立延迟更新过程或马尔可夫调制泊松过程时，HR排序规则充当缓存命中概率的上界。在一些特定的对象请求到达过程下，我们还推导了上界的闭式表达式。我们提供了仿真结果来验证其正确性，并将其与最先进的上界进行比较，例如由Bélády的算法产生的上界。我们发现，对于一些特定的对象请求到达过程，如独立更新、马尔可夫调制和散粒噪声过程，它比最先进的上限更严格。

{"title":"A New Upper Bound on Cache Hit Probability for Non-Anticipative Caching Policies","authors":"Nitish K. Panigrahy, P. Nain, G. Neglia, D. Towsley","doi":"10.1145/3547332","DOIUrl":"https://doi.org/10.1145/3547332","url":null,"abstract":"Caching systems have long been crucial for improving the performance of a wide variety of network and web-based online applications. In such systems, end-to-end application performance heavily depends on the fraction of objects transferred from the cache, also known as the cache hit probability. Many caching policies have been proposed and implemented to improve the hit probability. In this work, we propose a new method to compute an upper bound on hit probability for all non-anticipative caching policies and for policies that have no knowledge of future requests. Our key insight is to order the objects according to the ratio of their Hazard Rate(HR) function values to their sizes, and place in the cache the objects with the largest ratios till the cache capacity is exhausted. When object request processes are conditionally independent, we prove that this cache allocation based on the HR-to-size ratio rule guarantees the maximum achievable expected number of object hits across all non-anticipative caching policies. Further, the HR ordering rule serves as an upper bound on cache hit probability when object request processes follow either independent delayed renewal process or a Markov modulated Poisson process. We also derive closed form expressions for the upper bound under some specific object request arrival processes. We provide simulation results to validate its correctness and to compare it to the state-of-the-art upper bounds, such as produced by Bélády’s algorithm. We find it to be tighter than state-of-the-art upper bounds for some specific object request arrival processes such as independent renewal, Markov modulated, and shot noise processes.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"7 1","pages":"1 - 24"},"PeriodicalIF":0.6,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49153406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness Control 用吞吐量交换新鲜度:新鲜度感知流量工程和网络内新鲜度控制

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-06-03 DOI: 10.1145/3576919

Shih-Hao Tseng, Soojean Han, A. Wierman

With the advent of the Internet of Things (IoT), applications are becoming increasingly dependent on networks to not only transmit content at high throughput but also deliver it when it is fresh, i.e., synchronized between source and destination. Existing studies have proposed the metric age of information (AoI) to quantify freshness and have system designs that achieve low AoI. However, despite active research in this area, existing results are not applicable to general wired networks for two reasons. First, they focus on wireless settings, where AoI is mostly affected by interference and collision, while queueing issues are more prevalent in wired settings. Second, traditional high-throughput/low-latency legacy drop-adverse (LDA) flows are not taken into account in most system designs; hence, the problem of scheduling mixed flows with distinct performance objectives is not addressed. In this article, we propose a hierarchical system design to treat wired networks shared by mixed flow traffic, specifically LDA and AoI flows, and study the characteristics of achieving a good tradeoff between throughput and AoI. Our approach to the problem consists of two layers: freshness-aware traffic engineering (FATE) and in-network freshness control (IFC). The centralized FATE solution studies the characteristics of the source flow to derive the sending rate/update frequency for flows via the optimization problem LDA-AoI Coscheduling. The parameters specified by FATE are then distributed to IFC, which is implemented at each outport of the network’s nodes and used for efficient scheduling between LDA and AoI flows. We present a Linux implementation of IFC and demonstrate the effectiveness of FATE/IFC through extensive emulations. Our results show that it is possible to trade a little throughput (5% lower) for much shorter AoI (49% to 71% shorter) compared to state-of-the-art traffic engineering.

随着物联网(IoT)的出现，应用程序越来越依赖于网络，不仅要以高吞吐量传输内容，还要在内容新鲜时提供内容，即在源和目的地之间同步。现有的研究已经提出了度量信息年龄(AoI)来量化新鲜度，并有实现低AoI的系统设计。然而，尽管这一领域的研究非常活跃，但由于两个原因，现有的结果并不适用于一般的有线网络。首先，他们关注无线设置，其中AoI主要受到干扰和碰撞的影响，而排队问题在有线设置中更为普遍。其次，传统的高吞吐量/低延迟遗留drop-adverse (LDA)流在大多数系统设计中没有被考虑;因此，没有解决具有不同性能目标的混合流的调度问题。在本文中，我们提出了一种分层系统设计来处理由混合流流量(特别是LDA和AoI流)共享的有线网络，并研究了在吞吐量和AoI之间实现良好权衡的特性。我们解决这个问题的方法包括两层:新鲜度感知流量工程(FATE)和网络内新鲜度控制(IFC)。集中式FATE解决方案研究源流的特性，通过优化问题LDA-AoI协同调度，推导出流的发送速率/更新频率。FATE指定的参数随后被分发给IFC, IFC在网络节点的每个输出端口实现，并用于在LDA和AoI流之间进行有效调度。我们提出了IFC的Linux实现，并通过广泛的仿真证明了FATE/IFC的有效性。我们的结果表明，与最先进的交通工程相比，可以用少量吞吐量(降低5%)换取更短的AoI(缩短49%至71%)。

{"title":"Trading Throughput for Freshness: Freshness-aware Traffic Engineering and In-Network Freshness Control","authors":"Shih-Hao Tseng, Soojean Han, A. Wierman","doi":"10.1145/3576919","DOIUrl":"https://doi.org/10.1145/3576919","url":null,"abstract":"With the advent of the Internet of Things (IoT), applications are becoming increasingly dependent on networks to not only transmit content at high throughput but also deliver it when it is fresh, i.e., synchronized between source and destination. Existing studies have proposed the metric age of information (AoI) to quantify freshness and have system designs that achieve low AoI. However, despite active research in this area, existing results are not applicable to general wired networks for two reasons. First, they focus on wireless settings, where AoI is mostly affected by interference and collision, while queueing issues are more prevalent in wired settings. Second, traditional high-throughput/low-latency legacy drop-adverse (LDA) flows are not taken into account in most system designs; hence, the problem of scheduling mixed flows with distinct performance objectives is not addressed. In this article, we propose a hierarchical system design to treat wired networks shared by mixed flow traffic, specifically LDA and AoI flows, and study the characteristics of achieving a good tradeoff between throughput and AoI. Our approach to the problem consists of two layers: freshness-aware traffic engineering (FATE) and in-network freshness control (IFC). The centralized FATE solution studies the characteristics of the source flow to derive the sending rate/update frequency for flows via the optimization problem LDA-AoI Coscheduling. The parameters specified by FATE are then distributed to IFC, which is implemented at each outport of the network’s nodes and used for efficient scheduling between LDA and AoI flows. We present a Linux implementation of IFC and demonstrate the effectiveness of FATE/IFC through extensive emulations. Our results show that it is possible to trade a little throughput (5% lower) for much shorter AoI (49% to 71% shorter) compared to state-of-the-art traffic engineering.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"8 1","pages":"1 - 26"},"PeriodicalIF":0.6,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44252251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

VidCloud: Joint Stall and Quality Optimization for Video Streaming over Cloud VidCloud:云上视频流的联合失速和质量优化

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2021-01-01 DOI: 10.1145/3442187

A. Al-Abbasi, V. Aggarwal

As video-streaming services have expanded and improved, cloud-based video has evolved into a necessary feature of any successful business for reaching internal and external audiences. In this article, video streaming over distributed storage is considered where the video segments are encoded using an erasure code for better reliability. We consider a representative system architecture for a realistic (typical) content delivery network (CDN). Given multiple parallel streams/link between each server and the edge router, we need to determine, for each client request, the subset of servers to stream the video, as well as one of the parallel streams from each chosen server. To have this scheduling, this article proposes a two-stage probabilistic scheduling. The selection of video quality is also chosen with a certain probability distribution that is optimized in our algorithm. With these parameters, the playback time of video segments is determined by characterizing the download time of each coded chunk for each video segment. Using the playback times, a bound on the moment generating function of the stall duration is used to bound the mean stall duration. Based on this, we formulate an optimization problem to jointly optimize the convex combination of mean stall duration and average video quality for all requests, where the two-stage probabilistic scheduling, video quality selection, bandwidth split among parallel streams, and auxiliary bound parameters can be chosen. This non-convex problem is solved using an efficient iterative algorithm. Based on the offline version of our proposed algorithm, an online policy is developed where servers selection, quality, bandwidth split, and parallel streams are selected in an online manner. Experimental results show significant improvement in QoE metrics for cloud-based video as compared to the considered baselines.

随着视频流服务的扩展和改进，基于云的视频已经发展成为任何成功企业接触内部和外部受众的必要特征。在本文中，考虑分布式存储上的视频流，其中视频片段使用擦除码进行编码，以获得更好的可靠性。我们考虑一个现实的(典型的)内容分发网络(CDN)的代表性系统架构。给定每个服务器和边缘路由器之间的多个并行流/链接，我们需要为每个客户端请求确定流视频的服务器子集，以及来自每个选定服务器的并行流之一。为了实现这种调度，本文提出了一种两阶段概率调度。视频质量的选择也以一定的概率分布进行选择，并在算法中进行了优化。利用这些参数，通过表征每个视频片段的每个编码块的下载时间来确定视频片段的播放时间。使用播放时间，对失速持续时间的矩生成函数进行绑定，用于绑定平均失速持续时间。在此基础上，我们制定了一个优化问题，对所有请求的平均失速时间和平均视频质量的凸组合进行联合优化，其中可以选择两阶段概率调度、视频质量选择、并行流之间的带宽分割和辅助绑定参数。该非凸问题采用一种高效的迭代算法求解。基于我们提出的算法的离线版本，我们开发了一个在线策略，其中服务器选择、质量、带宽分割和并行流以在线方式选择。实验结果表明，与考虑的基线相比，基于云的视频的QoE指标有显著改善。

{"title":"VidCloud: Joint Stall and Quality Optimization for Video Streaming over Cloud","authors":"A. Al-Abbasi, V. Aggarwal","doi":"10.1145/3442187","DOIUrl":"https://doi.org/10.1145/3442187","url":null,"abstract":"As video-streaming services have expanded and improved, cloud-based video has evolved into a necessary feature of any successful business for reaching internal and external audiences. In this article, video streaming over distributed storage is considered where the video segments are encoded using an erasure code for better reliability. We consider a representative system architecture for a realistic (typical) content delivery network (CDN). Given multiple parallel streams/link between each server and the edge router, we need to determine, for each client request, the subset of servers to stream the video, as well as one of the parallel streams from each chosen server. To have this scheduling, this article proposes a two-stage probabilistic scheduling. The selection of video quality is also chosen with a certain probability distribution that is optimized in our algorithm. With these parameters, the playback time of video segments is determined by characterizing the download time of each coded chunk for each video segment. Using the playback times, a bound on the moment generating function of the stall duration is used to bound the mean stall duration. Based on this, we formulate an optimization problem to jointly optimize the convex combination of mean stall duration and average video quality for all requests, where the two-stage probabilistic scheduling, video quality selection, bandwidth split among parallel streams, and auxiliary bound parameters can be chosen. This non-convex problem is solved using an efficient iterative algorithm. Based on the offline version of our proposed algorithm, an online policy is developed where servers selection, quality, bandwidth split, and parallel streams are selected in an online manner. Experimental results show significant improvement in QoE metrics for cloud-based video as compared to the considered baselines.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"97 1","pages":"17:1-17:32"},"PeriodicalIF":0.6,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91022446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems RL-QN:排队系统最优控制的强化学习框架

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Pub Date : 2020-11-14 DOI: 10.1145/3529375

Bai Liu, Qiaomin Xie, E. Modiano

With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.

随着信息技术的快速发展，网络系统变得越来越复杂，因此潜在的系统动力学往往是未知的或难以表征的。找到一个好的网络控制策略对于实现期望的网络性能（例如，高吞吐量或低延迟）非常重要。在这项工作中，我们考虑使用基于模型的强化学习（RL）来学习排队网络的最优控制策略，以使平均作业延迟（或等效地平均队列积压）最小化。然而，RL中的传统方法无法处理网络控制问题的无界状态空间。为了克服这一困难，我们提出了一种新的算法，称为排队网络RL（RL-QN），该算法在状态空间的有限子集上应用基于模型的RL方法，同时对其余状态应用已知的稳定策略。我们建立了具有适当构造的子集的RL-QN下的平均队列积压可以任意接近最优结果。我们在动态服务器分配、路由和交换问题中评估RL-QN。仿真结果表明，RL-QN有效地最小化了平均队列积压。

{"title":"RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems","authors":"Bai Liu, Qiaomin Xie, E. Modiano","doi":"10.1145/3529375","DOIUrl":"https://doi.org/10.1145/3529375","url":null,"abstract":"With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.","PeriodicalId":56350,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems","volume":"7 1","pages":"1 - 35"},"PeriodicalIF":0.6,"publicationDate":"2020-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45155339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4