Current state of the art systems contain various types of multicore processors, General Purpose Graphics Processing Units (GPGPUs) and occasionally Digital Signal Processors (DSPs) or Field-Programmable Gate Arrays (FPGAs). With heterogeneity comes multiple abstraction layers that hide underlying complexity. While necessary to ease programmability of these systems, this hidden complexity makes quantitative performance modeling a difficult task. This paper outlines a computationally simple approach to modeling the overall throughput and buffering needs of a streaming application deployed on heterogeneous hardware.
{"title":"Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications","authors":"J. Beard, R. Chamberlain","doi":"10.1109/MASCOTS.2013.49","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.49","url":null,"abstract":"Current state of the art systems contain various types of multicore processors, General Purpose Graphics Processing Units (GPGPUs) and occasionally Digital Signal Processors (DSPs) or Field-Programmable Gate Arrays (FPGAs). With heterogeneity comes multiple abstraction layers that hide underlying complexity. While necessary to ease programmability of these systems, this hidden complexity makes quantitative performance modeling a difficult task. This paper outlines a computationally simple approach to modeling the overall throughput and buffering needs of a streaming application deployed on heterogeneous hardware.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114984031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is common nowadays to architect and design scaled-out systems with off-the-shelf computing components operated and managed by off-the-shelf open-source tools. While web services represent the critical set of services offered at scale, big data analytics is emerging as a preferred service to be colocated with cloud web services at a lower priority raising the need for off-the-shelf priority scheduling. In this paper we report on the perils of Linux priority scheduling tools when used to differentiate between such complex services. We demonstrate that simple priority scheduling utilities such as nice and ionice can result in dramatically erratic behavior. We provide a remedy by proposing an autonomic priority scheduling algorithm that adjusts its execution parameters based on on-line measurements of the current resource usage of critical applications. Detailed experimentation with a user-space prototype of the algorithm on a Linux system using popular benchmarks such as SPEC and TPC-W illustrate the robustness and versatility of the proposed technique, as it provides consistency to the expected performance of a high-priority application when running simultaneously with multiple low priority jobs.
{"title":"Overcoming Limitations of Off-the-Shelf Priority Schedulers in Dynamic Environments","authors":"Feng Yan, S. Hughes, Alma Riska, E. Smirni","doi":"10.1109/MASCOTS.2013.72","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.72","url":null,"abstract":"It is common nowadays to architect and design scaled-out systems with off-the-shelf computing components operated and managed by off-the-shelf open-source tools. While web services represent the critical set of services offered at scale, big data analytics is emerging as a preferred service to be colocated with cloud web services at a lower priority raising the need for off-the-shelf priority scheduling. In this paper we report on the perils of Linux priority scheduling tools when used to differentiate between such complex services. We demonstrate that simple priority scheduling utilities such as nice and ionice can result in dramatically erratic behavior. We provide a remedy by proposing an autonomic priority scheduling algorithm that adjusts its execution parameters based on on-line measurements of the current resource usage of critical applications. Detailed experimentation with a user-space prototype of the algorithm on a Linux system using popular benchmarks such as SPEC and TPC-W illustrate the robustness and versatility of the proposed technique, as it provides consistency to the expected performance of a high-priority application when running simultaneously with multiple low priority jobs.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130651136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a simulation's architecture that allows for the analysis of the performance when using the Power Line Communication's technology. In concrete, it studies the viability of PRIME' standard, to send Automatic Meter Reading (AMR) messages through a low voltage network. In contrast with other studies, physical phenomena-such as background and impulsive noise sources, channel attenuation and multipath effect-are taken into account by Mat lab simulations. Additionally, OMNeT++ network simulator is used to model the telematic effects that occur in the communication process. As an example of the kind of output that can be obtained by the proposed architecture, the paper analyses the end-to-end's performance at application layer in terms of round-trip latency. Several simulations are performed in a European low-voltage network topology to compute the number of meters that can be polled within 15 minutes. Additionally, one experiment tries to determine the optimal position of one of the key nodes in PRIME's networks: the SWITCH node.
{"title":"Automatic Meter-Reading Simulation through Power Line Communication","authors":"J. Matanza, S. Alexandres, C. Rodríguez-Morcillo","doi":"10.1109/MASCOTS.2013.36","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.36","url":null,"abstract":"This paper proposes a simulation's architecture that allows for the analysis of the performance when using the Power Line Communication's technology. In concrete, it studies the viability of PRIME' standard, to send Automatic Meter Reading (AMR) messages through a low voltage network. In contrast with other studies, physical phenomena-such as background and impulsive noise sources, channel attenuation and multipath effect-are taken into account by Mat lab simulations. Additionally, OMNeT++ network simulator is used to model the telematic effects that occur in the communication process. As an example of the kind of output that can be obtained by the proposed architecture, the paper analyses the end-to-end's performance at application layer in terms of round-trip latency. Several simulations are performed in a European low-voltage network topology to compute the number of meters that can be polled within 15 minutes. Additionally, one experiment tries to determine the optimal position of one of the key nodes in PRIME's networks: the SWITCH node.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134559750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Gouta, D. Hong, Anne-Marie Kermarrec, Yannick Le Louédec
Cellular networks have witnessed the emergence of the HTTP Adaptive Streaming (HAS) as a new video delivery method. In HAS, several qualities of the same videos are made available in the network so that clients can choose the best quality that fits their bandwidth capacity. This has particular implications on caching strategies with respect to the viewing patterns and the switching behavior between video qualities. In this paper we present analysis of a real HAS dataset collected in France and provided by the country's largest mobile phone operator. Firstly, we analyse the viewing patterns of HAS contents and the distribution of the encoding bit rates requested by mobile clients. Secondly, we give an in-depth analysis of the switching pattern between video bit rates during a video session and assess the implication on the caching efficiency. We also model this switching based on empirical observations. Finally, we propose WA-LRU a new caching algorithm tailored for HAS contents and compare it to the standard LRU. Our evaluations demonstrate that WA-LRU performs better and achieves its goals.
{"title":"HTTP Adaptive Streaming in Mobile Networks: Characteristics and Caching Opportunities","authors":"Ali Gouta, D. Hong, Anne-Marie Kermarrec, Yannick Le Louédec","doi":"10.1109/MASCOTS.2013.17","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.17","url":null,"abstract":"Cellular networks have witnessed the emergence of the HTTP Adaptive Streaming (HAS) as a new video delivery method. In HAS, several qualities of the same videos are made available in the network so that clients can choose the best quality that fits their bandwidth capacity. This has particular implications on caching strategies with respect to the viewing patterns and the switching behavior between video qualities. In this paper we present analysis of a real HAS dataset collected in France and provided by the country's largest mobile phone operator. Firstly, we analyse the viewing patterns of HAS contents and the distribution of the encoding bit rates requested by mobile clients. Secondly, we give an in-depth analysis of the switching pattern between video bit rates during a video session and assess the implication on the caching efficiency. We also model this switching based on empirical observations. Finally, we propose WA-LRU a new caching algorithm tailored for HAS contents and compare it to the standard LRU. Our evaluations demonstrate that WA-LRU performs better and achieves its goals.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131568931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-core computers are becoming increasingly ubiquitous. Understanding and being able to predict the performance of applications that run on such machines is paramount. This paper first shows experimentally that memory contention resulting from multiple cores accessing shared memory can become a significant component of an application's execution time. Then, the paper develops an approximate single-class analytic performance model that captures the effect of memory contention. The model is validated through measurements taken on a micro-benchmark and on well known Unix memory benchmark programs on machines with 4, 12, and 16 cores. The paper also shows that there is a significant difference in the predictions when memory contention is not considered.
{"title":"Analytic Models of Applications in Multi-core Computers","authors":"Shouvik Bardhan, D. Menascé","doi":"10.1109/MASCOTS.2013.43","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.43","url":null,"abstract":"Multi-core computers are becoming increasingly ubiquitous. Understanding and being able to predict the performance of applications that run on such machines is paramount. This paper first shows experimentally that memory contention resulting from multiple cores accessing shared memory can become a significant component of an application's execution time. Then, the paper develops an approximate single-class analytic performance model that captures the effect of memory contention. The model is validated through measurements taken on a micro-benchmark and on well known Unix memory benchmark programs on machines with 4, 12, and 16 cores. The paper also shows that there is a significant difference in the predictions when memory contention is not considered.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128047102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debugging wireless sensor network (WSN) applications has been complicated for multiple reasons, among which the lack of visibility is one of the most challenging. To address this issue, in this paper, we present a systematic approach to record and replay WSN applications at the granularity of instructions. This approach differs from previous ones in that it is purely software based, therefore, no additional hardware component is needed. Our key idea is to combine the static, structural information of the assembly-level code with their dynamic, run-time traces as measured by timestamps and basic block counters, so that we can faithfully infer and replay the actual execution paths of applications at instruction level in a post-mortem manner. The evaluation results show that this approach is feasible despite of the resource constraints of sensor nodes. We also provide two case studies to demonstrate that our instruction level record-and-replay approach can be used to: (1) discover randomness of EEPROM writing time, (2) localize stack smashing bugs in sensor network applications.
{"title":"Towards Instruction Level Record and Replay of Sensor Network Applications","authors":"Lipeng Wan, Qing Cao","doi":"10.1109/MASCOTS.2013.69","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.69","url":null,"abstract":"Debugging wireless sensor network (WSN) applications has been complicated for multiple reasons, among which the lack of visibility is one of the most challenging. To address this issue, in this paper, we present a systematic approach to record and replay WSN applications at the granularity of instructions. This approach differs from previous ones in that it is purely software based, therefore, no additional hardware component is needed. Our key idea is to combine the static, structural information of the assembly-level code with their dynamic, run-time traces as measured by timestamps and basic block counters, so that we can faithfully infer and replay the actual execution paths of applications at instruction level in a post-mortem manner. The evaluation results show that this approach is feasible despite of the resource constraints of sensor nodes. We also provide two case studies to demonstrate that our instruction level record-and-replay approach can be used to: (1) discover randomness of EEPROM writing time, (2) localize stack smashing bugs in sensor network applications.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125360173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qais Noorshams, Kiana Rostami, Samuel Kounev, P. Tůma, Ralf H. Reussner
Server virtualization is a key technology to share physical resources efficiently and flexibly. With the increasing popularity of I/O-intensive applications, however, the virtualized storage used in shared environments can easily become a bottleneck and cause performance and scalability issues. Performance modeling and evaluation techniques applied prior to system deployment help to avoid such issues. In current practice, however, virtualized storage and its effects on the overall system performance are often neglected or treated as a black-box. In this paper, we present a systematic I/O performance modeling approach for virtualized storage systems based on queueing theory. We first propose a general performance model building methodology. Then, we demonstrate our methodology creating I/O queueing models of a real-world representative environment based on IBM System z and IBM DS8700 server hardware. Finally, we present an in-depth evaluation of our models considering both interpolation and extrapolation scenarios as well as scenarios with multiple virtual machines. Overall, we effectively create performance models with less than 11% mean prediction error in the worst case and less than 5% prediction error on average.
服务器虚拟化是实现物理资源高效、灵活共享的关键技术。然而,随着I/ o密集型应用程序的日益普及,共享环境中使用的虚拟化存储很容易成为瓶颈,并导致性能和可伸缩性问题。在系统部署之前应用的性能建模和评估技术有助于避免此类问题。然而,在当前的实践中,虚拟化存储及其对系统整体性能的影响往往被忽略或视为一个黑箱。本文提出了一种基于排队理论的虚拟化存储系统I/O性能建模方法。我们首先提出了一种通用的性能模型构建方法。然后,我们演示了基于IBM System z和IBM DS8700服务器硬件创建现实世界代表性环境的I/O队列模型的方法。最后,我们对我们的模型进行了深入的评估,考虑了插值和外推场景以及多个虚拟机的场景。总的来说,我们有效地创建了在最坏情况下平均预测误差小于11%,平均预测误差小于5%的性能模型。
{"title":"I/O Performance Modeling of Virtualized Storage Systems","authors":"Qais Noorshams, Kiana Rostami, Samuel Kounev, P. Tůma, Ralf H. Reussner","doi":"10.1109/MASCOTS.2013.20","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.20","url":null,"abstract":"Server virtualization is a key technology to share physical resources efficiently and flexibly. With the increasing popularity of I/O-intensive applications, however, the virtualized storage used in shared environments can easily become a bottleneck and cause performance and scalability issues. Performance modeling and evaluation techniques applied prior to system deployment help to avoid such issues. In current practice, however, virtualized storage and its effects on the overall system performance are often neglected or treated as a black-box. In this paper, we present a systematic I/O performance modeling approach for virtualized storage systems based on queueing theory. We first propose a general performance model building methodology. Then, we demonstrate our methodology creating I/O queueing models of a real-world representative environment based on IBM System z and IBM DS8700 server hardware. Finally, we present an in-depth evaluation of our models considering both interpolation and extrapolation scenarios as well as scenarios with multiple virtual machines. Overall, we effectively create performance models with less than 11% mean prediction error in the worst case and less than 5% prediction error on average.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134173010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reliability of data storage systems is adversely affected by the presence of latent sector errors. As the number of occurrences of such errors increases with the storage capacity, latent sector errors have become more prevalent in today's high capacity storage devices. Such errors are typically not detected until an attempt is made to read the affected sectors. When a latent sector error is detected, the redundant data corresponding to the affected sector is used to recover its data. However, if no such redundant data is available, then the data of the affected sector is irrecoverably lost from the storage system. Therefore, the reliability of data storage systems is affected by both the complete failure of storage nodes and the latent sector errors within them. In this article, closed-form expressions for the mean time to data loss (MTTDL) of erasure coded storage systems in the presence of latent errors are derived. The effect of latent errors on systems with various types of redundancy, data placement, and sector error probabilities is studied. For small latent sector error probabilities, it is shown that the MTTDL is reduced by a factor that is independent of the number of parities in the data redundancy scheme as well as the number of nodes in the system. However, for large latent sector error probabilities, the MTTDL is similar to that of a system using a data redundancy scheme with one parity less. The reduction of the MTTDL in the latter case is more pronounced than in the former one.
{"title":"Effect of Latent Errors on the Reliability of Data Storage Systems","authors":"V. Venkatesan, I. Iliadis","doi":"10.1109/MASCOTS.2013.38","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.38","url":null,"abstract":"The reliability of data storage systems is adversely affected by the presence of latent sector errors. As the number of occurrences of such errors increases with the storage capacity, latent sector errors have become more prevalent in today's high capacity storage devices. Such errors are typically not detected until an attempt is made to read the affected sectors. When a latent sector error is detected, the redundant data corresponding to the affected sector is used to recover its data. However, if no such redundant data is available, then the data of the affected sector is irrecoverably lost from the storage system. Therefore, the reliability of data storage systems is affected by both the complete failure of storage nodes and the latent sector errors within them. In this article, closed-form expressions for the mean time to data loss (MTTDL) of erasure coded storage systems in the presence of latent errors are derived. The effect of latent errors on systems with various types of redundancy, data placement, and sector error probabilities is studied. For small latent sector error probabilities, it is shown that the MTTDL is reduced by a factor that is independent of the number of parities in the data redundancy scheme as well as the number of nodes in the system. However, for large latent sector error probabilities, the MTTDL is similar to that of a system using a data redundancy scheme with one parity less. The reduction of the MTTDL in the latter case is more pronounced than in the former one.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"143 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120886526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The IT industry needs systems management models that leverage available application information to detect quality of service, scalability and health of service. Ideally this technique would be common for varying application types with different n-tier architectures under normal production conditions of varying load, user session traffic, transaction type, transaction mix, and hosting environment. This paper shows that a whole of service measurement paradigm utilizing a black box M/M/1 queuing model and auto regression curve fitting of the associated CDF are an accurate model to characterize system performance signatures. This modeling method is used to detect application slow down events. The method did not rely on customizations specific to the n-tier architecture of the systems being analyzed and so the performance anomaly detection technique was shown to be platform and configuration agnostic.
{"title":"\"The Tail Wags the Dog\": A Study of Anomaly Detection in Commercial Application Performance","authors":"Richard Gow, S. Venugopal, P. Ray","doi":"10.1109/MASCOTS.2013.51","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.51","url":null,"abstract":"The IT industry needs systems management models that leverage available application information to detect quality of service, scalability and health of service. Ideally this technique would be common for varying application types with different n-tier architectures under normal production conditions of varying load, user session traffic, transaction type, transaction mix, and hosting environment. This paper shows that a whole of service measurement paradigm utilizing a black box M/M/1 queuing model and auto regression curve fitting of the associated CDF are an accurate model to characterize system performance signatures. This modeling method is used to detect application slow down events. The method did not rely on customizations specific to the n-tier architecture of the systems being analyzed and so the performance anomaly detection technique was shown to be platform and configuration agnostic.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}