Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366744
A. Marin, S. R. Bulò
In the last few years several new results about product-form solutions of stochastic models have been formulated. In particular, the Reversed Compound Agent Theorem (RCAT) and its extensions play a pivotal role in the characterization of cooperating stochastic models in product-form. Although these results have been used to prove several well-known theorems (e.g., Jackson queueing network and G-network solutions) as well as novel ones, to the best of our knowledge, an automatic tool to derive the product-form solution (if present) of a generic cooperation among a set of stochastic processes, is not yet developed. In this paper we address the problem of solving the non-linear system of equations that arises from the application of RCAT. We present an iterative algorithm that is the base of a software tool currently under development. We illustrate the algorithm, discuss the convergence and the complexity, compare it with previous algorithms defined for the analysis of the Jackson networks and the G-networks. Several tests have been conducted involving the solutions of a (arbitrary) large number of cooperating processes in product-form by RCAT.
{"title":"A general algorithm to compute the steady-state solution of product-form cooperating Markov chains","authors":"A. Marin, S. R. Bulò","doi":"10.1109/MASCOT.2009.5366744","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366744","url":null,"abstract":"In the last few years several new results about product-form solutions of stochastic models have been formulated. In particular, the Reversed Compound Agent Theorem (RCAT) and its extensions play a pivotal role in the characterization of cooperating stochastic models in product-form. Although these results have been used to prove several well-known theorems (e.g., Jackson queueing network and G-network solutions) as well as novel ones, to the best of our knowledge, an automatic tool to derive the product-form solution (if present) of a generic cooperation among a set of stochastic processes, is not yet developed. In this paper we address the problem of solving the non-linear system of equations that arises from the application of RCAT. We present an iterative algorithm that is the base of a software tool currently under development. We illustrate the algorithm, discuss the convergence and the complexity, compare it with previous algorithms defined for the analysis of the Jackson networks and the G-networks. Several tests have been conducted involving the solutions of a (arbitrary) large number of cooperating processes in product-form by RCAT.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128119306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366234
Pratyush Kumar, M. Desai
Interleaved address mapping has been effectively used to improve the performance of a parallely accessible memory subsystem. We propose a generalization of such mappings and study them in the framework of application specific MPSoCs. In this generalization, a section of the address bits is used to map each address to a memory bank and a row within that bank, using a Look-Up Table(LUT). We model the problem of address mapping optimization as a Markov Decision Process (MDP). To solve the MDP, we propose a reinforcement learning based algorithm which learns an optimized mapping within the generalized class, for a specific application mapped to an MPSoC system. Through cycle-accurate simulations on a simulation framework specifically developed for such a study, we demonstrate that a system using an address mapping generated in this manner exhibits substantially higher performance when compared to the same system using interleaved address mappings. These results indicate that application and architecture visibility can be leveraged to obtain better mappings than generic interleaved solutions, and that an automated reinforcement learning approach can identify such mappings using only the run-time behaviour of the system.
{"title":"Learning based address mapping for improving the performance of memory subsystems","authors":"Pratyush Kumar, M. Desai","doi":"10.1109/MASCOT.2009.5366234","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366234","url":null,"abstract":"Interleaved address mapping has been effectively used to improve the performance of a parallely accessible memory subsystem. We propose a generalization of such mappings and study them in the framework of application specific MPSoCs. In this generalization, a section of the address bits is used to map each address to a memory bank and a row within that bank, using a Look-Up Table(LUT). We model the problem of address mapping optimization as a Markov Decision Process (MDP). To solve the MDP, we propose a reinforcement learning based algorithm which learns an optimized mapping within the generalized class, for a specific application mapped to an MPSoC system. Through cycle-accurate simulations on a simulation framework specifically developed for such a study, we demonstrate that a system using an address mapping generated in this manner exhibits substantially higher performance when compared to the same system using interleaved address mappings. These results indicate that application and architecture visibility can be leveraged to obtain better mappings than generic interleaved solutions, and that an automated reinforcement learning approach can identify such mappings using only the run-time behaviour of the system.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114666713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5362674
P. Maillé, M. Naldi, B. Tuffin
In the telecommunication world, competition among providers to attract and keep customers is fierce. On the other hand customers churn between providers due to better prices, better reputation or better services. We propose in this paper to study the price war between two providers in the case where users' decisions are modeled by a Markov chain, with price-dependent transition rates. Each provider is assumed to look for a maximized revenue, which depends on the strategy of the competitor. Therefore, using the framework of non-cooperative game theory, we show how the price war can be analyzed and show the influence of various parameters.
{"title":"Price war with migrating customers","authors":"P. Maillé, M. Naldi, B. Tuffin","doi":"10.1109/MASCOT.2009.5362674","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5362674","url":null,"abstract":"In the telecommunication world, competition among providers to attract and keep customers is fierce. On the other hand customers churn between providers due to better prices, better reputation or better services. We propose in this paper to study the price war between two providers in the case where users' decisions are modeled by a Markov chain, with price-dependent transition rates. Each provider is assumed to look for a maximized revenue, which depends on the strategy of the competitor. Therefore, using the framework of non-cooperative game theory, we show how the price war can be analyzed and show the influence of various parameters.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125573086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366623
Deepavali Bhagwat, K. Eshghi, D. Long, Mark Lillibridge
Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.
{"title":"Extreme Binning: Scalable, parallel deduplication for chunk-based file backup","authors":"Deepavali Bhagwat, K. Eshghi, D. Long, Mark Lillibridge","doi":"10.1109/MASCOT.2009.5366623","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366623","url":null,"abstract":"Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117235741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366973
Guanying Wang, A. Butt, P. Pandey, Karan Gupta
MapReduce has emerged as a model of choice for supporting modern data-intensive applications. The model is easy-to-use and promising in reducing time-to-solution. It is also a key enabler for cloud computing, which provides transparent and flexible access to a large number of compute, storage and networking resources. Setting up and operating a large MapReduce cluster entails careful evaluation of various design choices and run-time parameters to achieve high efficiency. However, this design space has not been explored in detail. In this paper, we adopt a simulation approach to systematically understanding the performance of MapReduce setups. The resulting simulator, MRPerf, captures such aspects of these setups as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance. Specifically, we use MRPerf to explore the effect of several component inter-connect topologies, data locality, and software and hardware failures on overall application performance. MR-Perf allows us to quantify the effect of these factors, and thus can serve as a tool for optimizing existing MapReduce setups as well as designing new ones.
{"title":"A simulation approach to evaluating design decisions in MapReduce setups","authors":"Guanying Wang, A. Butt, P. Pandey, Karan Gupta","doi":"10.1109/MASCOT.2009.5366973","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366973","url":null,"abstract":"MapReduce has emerged as a model of choice for supporting modern data-intensive applications. The model is easy-to-use and promising in reducing time-to-solution. It is also a key enabler for cloud computing, which provides transparent and flexible access to a large number of compute, storage and networking resources. Setting up and operating a large MapReduce cluster entails careful evaluation of various design choices and run-time parameters to achieve high efficiency. However, this design space has not been explored in detail. In this paper, we adopt a simulation approach to systematically understanding the performance of MapReduce setups. The resulting simulator, MRPerf, captures such aspects of these setups as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance. Specifically, we use MRPerf to explore the effect of several component inter-connect topologies, data locality, and software and hardware failures on overall application performance. MR-Perf allows us to quantify the effect of these factors, and thus can serve as a tool for optimizing existing MapReduce setups as well as designing new ones.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125970446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5367061
B. Javadi, Derrick Kondo, J. Vincent, David P. Anderson
In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.
{"title":"Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home","authors":"B. Javadi, Derrick Kondo, J. Vincent, David P. Anderson","doi":"10.1109/MASCOT.2009.5367061","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5367061","url":null,"abstract":"In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5367043
L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler
Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.
{"title":"Enhancing and optimizing a data protection solution","authors":"L. Cherkasova, R. Lau, Harald Burose, Bernhard Kappler","doi":"10.1109/MASCOT.2009.5367043","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5367043","url":null,"abstract":"Analyzing and managing large amounts of unstructured information is a high priority task for many companies. For implementing content management solutions, companies need a comprehensive view of their unstructured data. In order to provide a new level of intelligence and control over data resident within the enterprise, one needs to build a chain of tools and automated processes that enable the evaluation, analysis, and visibility into information assets and their dynamics during the information life-cycle. We propose a novel framework to utilize the existing backup infrastructure by integrating additional content analysis routines and extracting already available filesystem metadata over time. This is used to perform data analysis and trending to add performance optimization and self-management capabilities to backup and information management tasks. Backup management faces serious challenges on its own: processing ever increasing amount of data while meeting the timing constraints of backup windows could require adaptive changes in backup scheduling routines. We revisit a traditional backup job scheduling and demonstrate that random job scheduling may lead to inefficient backup processing and an increased backup time. In this work, we use a historic information about the object backup processing time and suggest an additional job scheduling, and automated parameter tuning which may significantly optimize the overall backup time. Under this scheduling, called LBF, the longest backups (the objects with longest backup time) are scheduled first. We evaluate the performance benefits of the introduced scheduling using a realistic workload collected from the seven backup servers at HP Labs. Significant reduction of the backup time (up to 30%) and improved quality of service can be achieved under the proposed job assignment policy.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126516272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366659
James Poe, C. Hughes, Tao Li
Transactional memory provides a means to bridge the discrepancy between programmer productivity and the difficulty in exploiting thread-level parallelism gains offered by emerging chip multiprocessors. Because the hardware has outpaced the software, there are very few modern multithreaded benchmarks available and even fewer for transactional memory researchers. This hurdle must be overcome for transactional memory research to mature and to gain widespread acceptance. Currently, for performance evaluations, most researchers rely on manually converted lock-based multithreaded workloads or the small group of programs written explicitly for transactional memory. Using converted benchmarks is problematic because they have been tuned so well that they may not be representative of how a programmer will actually use transactional memory. Hand coding stressor benchmarks is unattractive because it is tedious and time consuming. A new parameterized methodology that can automatically generate a program based on the desired high-level program characteristics benefits the transactional memory community. In this work, we propose techniques to generate parameterized transactional memory benchmarks based on a feature set, decoupled from the underlying transactional model. Using principle component analysis, clustering, and raw transactional performance metrics, we show that TransPlant can generate benchmarks with features that lie outside the boundary occupied by these traditional benchmarks. We also show how TransPlant can mimic the behavior of SPLASH-2 and STAMP transactional memory workloads. The program generation methods proposed here will help transactional memory architects select a robust set of programs for quick design evaluations.
{"title":"TransPlant: A parameterized methodology for generating transactional memory workloads","authors":"James Poe, C. Hughes, Tao Li","doi":"10.1109/MASCOT.2009.5366659","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366659","url":null,"abstract":"Transactional memory provides a means to bridge the discrepancy between programmer productivity and the difficulty in exploiting thread-level parallelism gains offered by emerging chip multiprocessors. Because the hardware has outpaced the software, there are very few modern multithreaded benchmarks available and even fewer for transactional memory researchers. This hurdle must be overcome for transactional memory research to mature and to gain widespread acceptance. Currently, for performance evaluations, most researchers rely on manually converted lock-based multithreaded workloads or the small group of programs written explicitly for transactional memory. Using converted benchmarks is problematic because they have been tuned so well that they may not be representative of how a programmer will actually use transactional memory. Hand coding stressor benchmarks is unattractive because it is tedious and time consuming. A new parameterized methodology that can automatically generate a program based on the desired high-level program characteristics benefits the transactional memory community. In this work, we propose techniques to generate parameterized transactional memory benchmarks based on a feature set, decoupled from the underlying transactional model. Using principle component analysis, clustering, and raw transactional performance metrics, we show that TransPlant can generate benchmarks with features that lie outside the boundary occupied by these traditional benchmarks. We also show how TransPlant can mimic the behavior of SPLASH-2 and STAMP transactional memory workloads. The program generation methods proposed here will help transactional memory architects select a robust set of programs for quick design evaluations.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131123440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366656
J. Happe, Henning Groenda, Ralf H. Reussner
The shift of hardware architecture towards parallel execution led to a broad usage of multi-core processors in desktop systems and in server systems. The benefit of additional processor cores for software performance depends on the software's parallelism as well as the operating system scheduler's capabilities. Especially, the load on the available processors (or cores) strongly influences response times and throughput of software applications. Hence, a sophisticated understanding of the mutual influence of software behaviour and operating system schedulers is essential for accurate performance evaluations. Multi-core systems pose new challenges for performance analysis and developers of operating systems. For example, an optimal scheduling policy for multi-server systems, such as shortest remaining processing time (SRPT) for single-server systems, is not yet known in queueing theory. In this paper, we present a detailed experimental evaluation of general purpose operating system (GPOS) schedulers in symmetric multiprocessing (SMP) environments. In particular, we are interested in the influence of multiprocessor load balancing on software performance. Additionally, the evaluation includes effects of GPOS schedulers that can also occur in single-processor environments, such as I/O-boundedness of tasks and different prioritisation strategies. The results presented in this paper provide the basis for the future development of more accurate performance models of today's software systems.
{"title":"Performance evaluation of scheduling policies in symmetric multiprocessing environments","authors":"J. Happe, Henning Groenda, Ralf H. Reussner","doi":"10.1109/MASCOT.2009.5366656","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366656","url":null,"abstract":"The shift of hardware architecture towards parallel execution led to a broad usage of multi-core processors in desktop systems and in server systems. The benefit of additional processor cores for software performance depends on the software's parallelism as well as the operating system scheduler's capabilities. Especially, the load on the available processors (or cores) strongly influences response times and throughput of software applications. Hence, a sophisticated understanding of the mutual influence of software behaviour and operating system schedulers is essential for accurate performance evaluations. Multi-core systems pose new challenges for performance analysis and developers of operating systems. For example, an optimal scheduling policy for multi-server systems, such as shortest remaining processing time (SRPT) for single-server systems, is not yet known in queueing theory. In this paper, we present a detailed experimental evaluation of general purpose operating system (GPOS) schedulers in symmetric multiprocessing (SMP) environments. In particular, we are interested in the influence of multiprocessor load balancing on software performance. Additionally, the evaluation includes effects of GPOS schedulers that can also occur in single-processor environments, such as I/O-boundedness of tasks and different prioritisation strategies. The results presented in this paper provide the basis for the future development of more accurate performance models of today's software systems.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116196183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366737
Biplob K. Debnath, S. Subramanya, D. Du, D. Lilja
Solid State Disks (SSDs) using NAND flash memory are increasingly being adopted in the high-end servers of datacenters to improve performance of the I/O-intensive applications. Compared to the traditional enterprise class hard disks, SSDs provide faster read performance, lower cooling cost, and higher power efficiency. However, write performance of a flash based SSD can be up to an order of magnitude slower than its read performance. Furthermore, frequent write operations degrade the lifetime of flash memory. A nonvolatile cache can greatly help to solve these problems. Although a RAM cache is relative high in cost, it has successfully eliminated the performance gap between fast CPU and slow magnetic disk. Similarly, a nonvolatile cache in an SSD can alleviate the disparity between the flash memory's read and write performance. A small write cache that reduces the number of flash block erase operations, can lead to substantial performance gain for write-intensive applications and can extend the overall lifetime of flash based SSDs. This paper presents a novel write caching algorithm, the Large Block CLOCK (LB-CLOCK) algorithm, which considers ‘recency’ and ‘block space utilization’ metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. Our simulation based experimental results show that LB-CLOCK outperforms the best known existing flash caching algorithms for a wide range of workloads.
{"title":"Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks","authors":"Biplob K. Debnath, S. Subramanya, D. Du, D. Lilja","doi":"10.1109/MASCOT.2009.5366737","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366737","url":null,"abstract":"Solid State Disks (SSDs) using NAND flash memory are increasingly being adopted in the high-end servers of datacenters to improve performance of the I/O-intensive applications. Compared to the traditional enterprise class hard disks, SSDs provide faster read performance, lower cooling cost, and higher power efficiency. However, write performance of a flash based SSD can be up to an order of magnitude slower than its read performance. Furthermore, frequent write operations degrade the lifetime of flash memory. A nonvolatile cache can greatly help to solve these problems. Although a RAM cache is relative high in cost, it has successfully eliminated the performance gap between fast CPU and slow magnetic disk. Similarly, a nonvolatile cache in an SSD can alleviate the disparity between the flash memory's read and write performance. A small write cache that reduces the number of flash block erase operations, can lead to substantial performance gain for write-intensive applications and can extend the overall lifetime of flash based SSDs. This paper presents a novel write caching algorithm, the Large Block CLOCK (LB-CLOCK) algorithm, which considers ‘recency’ and ‘block space utilization’ metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. Our simulation based experimental results show that LB-CLOCK outperforms the best known existing flash caching algorithms for a wide range of workloads.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122168409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}