Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366756
J. M. Ewing, D. Menascé
Autonomic computing systems are able to adapt to changing environments (such as changes in the workload intensity or component failures) in a way that preserves highlevel operational goals, such as service level objectives. This paper focuses on autonomic computing systems that are self-optimizing and self-configuring. More specifically, the paper presents the detailed design of an autonomic load balancer (LB) for multi-tiered Web sites. It is assumed that customers can be categorized into distinct classes (gold, silver, and bronze) according to their business value to the site. While the example used in the paper is that of an auction site, the approach can be easily applied to any other Web site. The autonomic LB is able to dynamically change its request redirection policy as well as its resource allocation policy, which determines the allocation of servers to server clusters, in a way that maximizes a business-oriented utility function. The autonomic LB was evaluated through very detailed and comprehensive simulation experiments and was compared against a round-robin LB and against a situation where each customer category has a dedicated number of servers. The results showed that the autonomic LB outperforms the other load balancing approaches in terms of providing a higher utility for highly dynamic workloads.
{"title":"Business-oriented autonomic load balancing for multitiered Web sites","authors":"J. M. Ewing, D. Menascé","doi":"10.1109/MASCOT.2009.5366756","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366756","url":null,"abstract":"Autonomic computing systems are able to adapt to changing environments (such as changes in the workload intensity or component failures) in a way that preserves highlevel operational goals, such as service level objectives. This paper focuses on autonomic computing systems that are self-optimizing and self-configuring. More specifically, the paper presents the detailed design of an autonomic load balancer (LB) for multi-tiered Web sites. It is assumed that customers can be categorized into distinct classes (gold, silver, and bronze) according to their business value to the site. While the example used in the paper is that of an auction site, the approach can be easily applied to any other Web site. The autonomic LB is able to dynamically change its request redirection policy as well as its resource allocation policy, which determines the allocation of servers to server clusters, in a way that maximizes a business-oriented utility function. The autonomic LB was evaluated through very detailed and comprehensive simulation experiments and was compared against a round-robin LB and against a situation where each customer category has a dedicated number of servers. The results showed that the autonomic LB outperforms the other load balancing approaches in terms of providing a higher utility for highly dynamic workloads.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132629131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366812
Sébastien Doirieux, B. Baynat, Thomas Begin
In this paper, we explore a way to find the right scheduling policy for WiMAX networks, that achieves the best compromise between an efficient use of the resource and a relative fairness among users. This problem is of primary importance as no scheduling policy has been recommended in the WiMAX standard. To do so, we develop an extension of our previous analytical model for WiMAX networks, that takes into account a more general scheduling policy than those previously studied (i.e., instantaneous throughput fairness, slot sharing fairness and opportunistic scheduling). We show that this general policy covers the two extreme cases, namely the instantaneous throughput fairness policy and the opportunistic policy, and offers intermediate policies that are good candidates for finding the right trade-off. In order to formulate the decision criterion, we introduce a new performance parameter, the mean throughput obtained by a user depending on its efficiency to use the resource. The model has a closed-form solution, and all performance parameters can be obtained instantaneously. This allows us to carry out dimensioning studies that require several thousands of evaluations, which would not be tractable with any simulation tool.
{"title":"On finding the right balance between fairness and efficiency in WiMAX scheduling through analytical modeling","authors":"Sébastien Doirieux, B. Baynat, Thomas Begin","doi":"10.1109/MASCOT.2009.5366812","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366812","url":null,"abstract":"In this paper, we explore a way to find the right scheduling policy for WiMAX networks, that achieves the best compromise between an efficient use of the resource and a relative fairness among users. This problem is of primary importance as no scheduling policy has been recommended in the WiMAX standard. To do so, we develop an extension of our previous analytical model for WiMAX networks, that takes into account a more general scheduling policy than those previously studied (i.e., instantaneous throughput fairness, slot sharing fairness and opportunistic scheduling). We show that this general policy covers the two extreme cases, namely the instantaneous throughput fairness policy and the opportunistic policy, and offers intermediate policies that are good candidates for finding the right trade-off. In order to formulate the decision criterion, we introduce a new performance parameter, the mean throughput obtained by a user depending on its efficiency to use the resource. The model has a closed-form solution, and all performance parameters can be obtained instantaneously. This allows us to carry out dimensioning studies that require several thousands of evaluations, which would not be tractable with any simulation tool.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128012694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366151
Q. Wei, B. Veeravalli, Zhixiang Li
Disk idle periods in server workload are short, which significantly limits the effectiveness of underline disk power management. To release this limitation, we present a Cooperative Power Management (referred to as CPM) scheme to save energy with performance guarantee for object-based storage cluster. CPM reclaims idle memories of neighboring Object-based Storage Devices (OSDs) over high speed network as remote cache to store evicted objects. Then requests missed in local cache could be hit by remote cache, and local disk does not necessarily spin back up to service these requests. Hence, CPM can artificially create long idle periods to provide more opportunities for underlying disk power management. CPM minimizes the risk of performance and energy penalty by spinning down disks only when predicted idle period is long enough to justify statetransition energy. Our rigorous experiment results conclusively demonstrate that CPM can dynamically adapt to workload changes and outperform existing solutions in terms of energy saving and performance for large-scale OSD cluster.
服务器工作负载中的磁盘空闲时间很短,这极大地限制了底层磁盘电源管理的有效性。为了克服这一限制,我们提出了一种基于对象存储集群的协同电源管理(Cooperative Power Management,简称CPM)方案,在保证性能的前提下实现节能。CPM通过高速网络回收相邻osd (Object-based Storage device)的空闲内存作为远程缓存来存储被驱逐的对象。然后,在本地缓存中丢失的请求可能会被远程缓存击中,并且本地磁盘不一定会重新启动以服务这些请求。因此,CPM可以人为地创建较长的空闲时间,为底层磁盘电源管理提供更多机会。CPM只有在预测的空闲时间长到足以证明状态转换能量的合理性时,才会关闭磁盘,从而将性能和能量损失的风险降至最低。我们严谨的实验结果最终证明,CPM可以动态适应工作负载的变化,并且在节能和性能方面优于现有的大规模OSD集群解决方案。
{"title":"CPM: Cooperative power management for object-based storage cluster","authors":"Q. Wei, B. Veeravalli, Zhixiang Li","doi":"10.1109/MASCOT.2009.5366151","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366151","url":null,"abstract":"Disk idle periods in server workload are short, which significantly limits the effectiveness of underline disk power management. To release this limitation, we present a Cooperative Power Management (referred to as CPM) scheme to save energy with performance guarantee for object-based storage cluster. CPM reclaims idle memories of neighboring Object-based Storage Devices (OSDs) over high speed network as remote cache to store evicted objects. Then requests missed in local cache could be hit by remote cache, and local disk does not necessarily spin back up to service these requests. Hence, CPM can artificially create long idle periods to provide more opportunities for underlying disk power management. CPM minimizes the risk of performance and energy penalty by spinning down disks only when predicted idle period is long enough to justify statetransition energy. Our rigorous experiment results conclusively demonstrate that CPM can dynamically adapt to workload changes and outperform existing solutions in terms of energy saving and performance for large-scale OSD cluster.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131555903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5367047
E. Lynch, G. Riley
Historically, large-scale low-lookahead parallel simulation has been a difficult problem. As a solution, we have designed a Global Synchronization Unit (GSU) that would reside centrally on a multi-core chip and asynchronously compute the Lower Bound on Time Stamps (LBTS), the minimum timestamp of all unprocessed events in the simulation, on demand to synchronize conservative parallel simulators. Our GSU also accounts for transient messages, messages that have been sent but not yet processed by their recipient, eliminating the need for the simulator to acknowledge received messages. In this paper we analyze the sensitivity of simulation performance to the time required to access the GSU. The sensitivity analysis revealed that with GSU access times as high as hundreds of cycles, there was still a significant performance advantage over the baseline shared-memory implementation.
{"title":"A sensitivity analysis of a new hardware-supported Global Synchronization Unit","authors":"E. Lynch, G. Riley","doi":"10.1109/MASCOT.2009.5367047","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5367047","url":null,"abstract":"Historically, large-scale low-lookahead parallel simulation has been a difficult problem. As a solution, we have designed a Global Synchronization Unit (GSU) that would reside centrally on a multi-core chip and asynchronously compute the Lower Bound on Time Stamps (LBTS), the minimum timestamp of all unprocessed events in the simulation, on demand to synchronize conservative parallel simulators. Our GSU also accounts for transient messages, messages that have been sent but not yet processed by their recipient, eliminating the need for the simulator to acknowledge received messages. In this paper we analyze the sensitivity of simulation performance to the time required to access the GSU. The sensitivity analysis revealed that with GSU access times as high as hundreds of cycles, there was still a significant performance advantage over the baseline shared-memory implementation.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114269368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366358
D. Lugones, Daniel Franco, Eduardo Argollo, E. Luque
Modeling Interconnection networks is an important research topic enabling the study of the interconnection behavior and its significance in telecommunication applications and distributed systems. However, complexity of large-scale networks makes development of models and simulation tools a prohibitively difficult task. In this paper we have explored the network modeling space design to provide models following two different approaches: accurate simulation models based on finite state machines (FSM), and also, analytical models to provide profitable speedup with a minimal accuracy loss. Experiments results show that the proposed analytical model provides a faithful abstraction for the scale of systems that are of interest in the foreseeable future, it reaches an 8% error and speedup of around 30x vs. a FSM model.
{"title":"Models for high-speed interconnection networks performance analysis","authors":"D. Lugones, Daniel Franco, Eduardo Argollo, E. Luque","doi":"10.1109/MASCOT.2009.5366358","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366358","url":null,"abstract":"Modeling Interconnection networks is an important research topic enabling the study of the interconnection behavior and its significance in telecommunication applications and distributed systems. However, complexity of large-scale networks makes development of models and simulation tools a prohibitively difficult task. In this paper we have explored the network modeling space design to provide models following two different approaches: accurate simulation models based on finite state machines (FSM), and also, analytical models to provide profitable speedup with a minimal accuracy loss. Experiments results show that the proposed analytical model provides a faithful abstraction for the scale of systems that are of interest in the foreseeable future, it reaches an 8% error and speedup of around 30x vs. a FSM model.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114928875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366184
Dinesh Kumar, D. Olshefski, Li Zhang
Managing client perceived pageview response time for multiple classes of service is essential in today's highly competitive, e-commerce environment. We present Connection and Performance Model Driven Optimization (CP-MDO), a novel approach for providing optimal QoS as defined by a cost objective based on client perceived pageview response time and pageview drop rate. Our approach combines two vital models: 1) a latency model for connection establishment that captures the interactions between web browsers and web servers across network protocol layers and 2) a server performance model based on queueing theory that models performance across all tiers of a server complex. An algorithm capable of enforcing the optimal admission control based on the inter-arrival time between pageview admissions is given. Our approach has been implemented and evaluated in an experimental setting, demonstrating how CP-MDO achieves the minimal cost while providing minimal pageview response times under minimal drop rates across multiple classes of service.
{"title":"Connection and performance model driven optimization of pageview response time","authors":"Dinesh Kumar, D. Olshefski, Li Zhang","doi":"10.1109/MASCOT.2009.5366184","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366184","url":null,"abstract":"Managing client perceived pageview response time for multiple classes of service is essential in today's highly competitive, e-commerce environment. We present Connection and Performance Model Driven Optimization (CP-MDO), a novel approach for providing optimal QoS as defined by a cost objective based on client perceived pageview response time and pageview drop rate. Our approach combines two vital models: 1) a latency model for connection establishment that captures the interactions between web browsers and web servers across network protocol layers and 2) a server performance model based on queueing theory that models performance across all tiers of a server complex. An algorithm capable of enforcing the optimal admission control based on the inter-arrival time between pageview admissions is given. Our approach has been implemented and evaluated in an experimental setting, demonstrating how CP-MDO achieves the minimal cost while providing minimal pageview response times under minimal drop rates across multiple classes of service.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126083912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366143
Marga Nácher, C. Calafate, Juan-Carlos Cano, P. Manzoni
Eavesdropping is an important threat in the context of mobile ad-hoc networks due to the use of open air as the transmission medium. As a consequence, some works trying to prevent this threat have been proposed. Some of these proposals focus on the use of anonymous routing protocols. In this paper we analyze two of the most popular: ANODR and MASK. We evaluate their performance through simulation in terms of throughput and routing overhead in order to measure the cost of providing anonymity. Simulation results show that these anonymous routing protocols reduce performance to inefficient levels.
{"title":"Anonymous routing protocols: Impact on performance in MANETs","authors":"Marga Nácher, C. Calafate, Juan-Carlos Cano, P. Manzoni","doi":"10.1109/MASCOT.2009.5366143","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366143","url":null,"abstract":"Eavesdropping is an important threat in the context of mobile ad-hoc networks due to the use of open air as the transmission medium. As a consequence, some works trying to prevent this threat have been proposed. Some of these proposals focus on the use of anonymous routing protocols. In this paper we analyze two of the most popular: ANODR and MASK. We evaluate their performance through simulation in terms of throughput and routing overhead in order to measure the cost of providing anonymity. Simulation results show that these anonymous routing protocols reduce performance to inefficient levels.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124286762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366825
Avani Wildani, T. Schwarz, E. L. Miller, D. Long
Digital archives are growing rapidly, necessitating stronger reliability measures than RAID to avoid data loss from device failure. Mirroring, a popular solution, is too expensive over time. We present a compromise solution that uses multi-level redundancy coding to reduce the probability of data loss from multiple simultaneous device failures. This approach handles small-scale failures of one or two devices efficiently while still allowing the system to survive rare-event, larger-scale failures of four or more devices. In our approach, each disk is split into a set of fixed size disklets which are used to construct reliability stripes. To protect against rare event failures, reliability stripes are grouped into larger super-groups, each of which has a corresponding super-parity; super-parity is only used to recover data when disk failures overwhelm the redundancy in a single reliability stripe. Super-parity can be stored on a variety of devices such as NV-RAM and always-on disks to offset write bottlenecks while still keeping the number of active devices low. Our calculations of failure probabilities show that adding super-parity allows our system to absorb many more disk failures without data loss. Through discrete event simulation, we found that adding super-groups has a significant impact on mean time to data loss and that rebuilds are slow but not unmanageable. Finally, we showed that robustness against rare events can be achieved for a fraction of total system cost.
{"title":"Protecting against rare event failures in archival systems","authors":"Avani Wildani, T. Schwarz, E. L. Miller, D. Long","doi":"10.1109/MASCOT.2009.5366825","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366825","url":null,"abstract":"Digital archives are growing rapidly, necessitating stronger reliability measures than RAID to avoid data loss from device failure. Mirroring, a popular solution, is too expensive over time. We present a compromise solution that uses multi-level redundancy coding to reduce the probability of data loss from multiple simultaneous device failures. This approach handles small-scale failures of one or two devices efficiently while still allowing the system to survive rare-event, larger-scale failures of four or more devices. In our approach, each disk is split into a set of fixed size disklets which are used to construct reliability stripes. To protect against rare event failures, reliability stripes are grouped into larger super-groups, each of which has a corresponding super-parity; super-parity is only used to recover data when disk failures overwhelm the redundancy in a single reliability stripe. Super-parity can be stored on a variety of devices such as NV-RAM and always-on disks to offset write bottlenecks while still keeping the number of active devices low. Our calculations of failure probabilities show that adding super-parity allows our system to absorb many more disk failures without data loss. Through discrete event simulation, we found that adding super-groups has a significant impact on mean time to data loss and that rebuilds are slow but not unmanageable. Finally, we showed that robustness against rare events can be achieved for a fraction of total system cost.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123528045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366188
Fei Yang, I. Augé-Blum
Duty-cycle prolongs the lifetime of battery-powered wireless sensor networks (WSNs). However, it incurs additional delay because the nodes may be asleep. In addition to energy constraints, many applications have real-time constraints, which means the sink has to be informed before a deadline when an event occurs. Moreover, wireless links among low power radios are highly unreliable. These pose big challenges to design protocols for real-time applications. In this paper, a novel forwarding scheme based on distributed wakeup scheduling is proposed which can guarantee bounded delay and have higher delivery ratio for ultra low duty-cycle WSNs under unreliable links. The proposed wakeup scheduling algorithm schedules the wakeup time of each node according to the hop number and expected delivery ratio to the sink. We model the forwarding scheme and analyze its properties. Simulation results show that the proposed algorithm has better performances in terms of delivery ratio and end-to-end delay.
{"title":"On maximizing the delivery ratio of ultra low duty-cycle WSNs under real-time constraints","authors":"Fei Yang, I. Augé-Blum","doi":"10.1109/MASCOT.2009.5366188","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366188","url":null,"abstract":"Duty-cycle prolongs the lifetime of battery-powered wireless sensor networks (WSNs). However, it incurs additional delay because the nodes may be asleep. In addition to energy constraints, many applications have real-time constraints, which means the sink has to be informed before a deadline when an event occurs. Moreover, wireless links among low power radios are highly unreliable. These pose big challenges to design protocols for real-time applications. In this paper, a novel forwarding scheme based on distributed wakeup scheduling is proposed which can guarantee bounded delay and have higher delivery ratio for ultra low duty-cycle WSNs under unreliable links. The proposed wakeup scheduling algorithm schedules the wakeup time of each node according to the hop number and expected delivery ratio to the sink. We model the forwarding scheme and analyze its properties. Simulation results show that the proposed algorithm has better performances in terms of delivery ratio and end-to-end delay.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123624025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-28DOI: 10.1109/MASCOT.2009.5366603
Zhibin Yu, Hai Jin, Jing Chen, L. John
Accelerating micro-architecture simulation is becoming increasingly urgent as the complexity of workload and simulated processor increases. This paper presents a novel two-stage sampling (TSS) scheme to accelerate the sampling-based simulation. It firstly selects some large samples from a dynamic instruction stream as candidates of detail simulation and then samples some small groups from each selected first stage sample to do detail simulation. Since the distribution of standard deviation of cycle per instruction (CPI) is insensitive to microarchitecture, TSS could be used to speedup design space exploration by splitting the sampling process into two stages, which is able to remove redundant instruction samples from detail simulation when the program is in stable program phase (standard deviation of CPI is near zero). It also adopts systematic sampling to accelerate the functional warm-up in sampling simulation. Experimental results show that, by combining these two techniques, TSS achieves an average and maximum speedup of 1.3 and 2.29 over SMARTS, with the average CPI relative error is less than 3%. TSS could significantly accelerate the time consuming iterative early design evaluation process.
随着工作负载和仿真处理器复杂性的增加,加速微体系结构仿真变得越来越迫切。本文提出了一种新的两阶段采样(TSS)方案来加速基于采样的仿真。首先从动态指令流中选取一些大样本作为详细仿真的候选样本,然后从每个选取的第一阶段样本中选取一些小样本进行详细仿真。由于CPI (cycle per instruction)的标准差分布对微体系结构不敏感,TSS可以通过将采样过程分为两个阶段来加速设计空间的探索,当程序处于稳定的程序阶段(CPI的标准差接近于零)时,可以从细节仿真中去除冗余的指令样本。在采样仿真中采用系统采样加速功能预热。实验结果表明,结合这两种技术,TSS比SMARTS平均和最大加速分别提高了1.3和2.29,平均CPI相对误差小于3%。TSS可以显著加快耗时的迭代早期设计评估过程。
{"title":"TSS: Applying two-stage sampling in micro-architecture simulations","authors":"Zhibin Yu, Hai Jin, Jing Chen, L. John","doi":"10.1109/MASCOT.2009.5366603","DOIUrl":"https://doi.org/10.1109/MASCOT.2009.5366603","url":null,"abstract":"Accelerating micro-architecture simulation is becoming increasingly urgent as the complexity of workload and simulated processor increases. This paper presents a novel two-stage sampling (TSS) scheme to accelerate the sampling-based simulation. It firstly selects some large samples from a dynamic instruction stream as candidates of detail simulation and then samples some small groups from each selected first stage sample to do detail simulation. Since the distribution of standard deviation of cycle per instruction (CPI) is insensitive to microarchitecture, TSS could be used to speedup design space exploration by splitting the sampling process into two stages, which is able to remove redundant instruction samples from detail simulation when the program is in stable program phase (standard deviation of CPI is near zero). It also adopts systematic sampling to accelerate the functional warm-up in sampling simulation. Experimental results show that, by combining these two techniques, TSS achieves an average and maximum speedup of 1.3 and 2.29 over SMARTS, with the average CPI relative error is less than 3%. TSS could significantly accelerate the time consuming iterative early design evaluation process.","PeriodicalId":275737,"journal":{"name":"2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131681712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}