首页 > 最新文献

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems最新文献

英文 中文
Performance and Power Consumption Measurement of Java Application Servers Java应用服务器的性能和功耗测量
Hitoshi Oi, Sho Niboshi
In this paper, we present our in-progress project of modeling performance and power consumption of Java application servers using SPECjEnterprise2010. We run the workload on two application server using two different CPUs, AMD Phenom II and Intel Atom, and investigate performance and power consumption behaviors against the increasing system sizes. We have observed that: (1) CPU utilization draws non-linear functions of the system size and their shapes are different on Phenom and Atom. However, power consumption on both servers increase proportionally. (2) Browse transaction is the source of non-linearly in the CPU utilization. (3) Estimation of the CPU utilization from that of each transaction measured separately incurs large errors (up to 65%), while the errors in the estimation of the power consumption are relatively small (up to 4%).
在本文中,我们介绍了我们正在进行的使用SPECjEnterprise2010对Java应用服务器的性能和功耗进行建模的项目。我们使用两种不同的cpu (AMD Phenom II和Intel Atom)在两台应用服务器上运行工作负载,并针对不断增加的系统大小调查性能和功耗行为。我们观察到:(1)CPU利用率绘制了系统大小的非线性函数,其形状在Phenom和Atom上是不同的。但是,两台服务器上的功耗会成比例地增加。(2)浏览事务是CPU利用率非线性的根源。(3)通过单独测量每个事务的CPU利用率来估算CPU利用率的误差较大(最高可达65%),而估算功耗的误差相对较小(最高可达4%)。
{"title":"Performance and Power Consumption Measurement of Java Application Servers","authors":"Hitoshi Oi, Sho Niboshi","doi":"10.1109/MASCOTS.2012.68","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.68","url":null,"abstract":"In this paper, we present our in-progress project of modeling performance and power consumption of Java application servers using SPECjEnterprise2010. We run the workload on two application server using two different CPUs, AMD Phenom II and Intel Atom, and investigate performance and power consumption behaviors against the increasing system sizes. We have observed that: (1) CPU utilization draws non-linear functions of the system size and their shapes are different on Phenom and Atom. However, power consumption on both servers increase proportionally. (2) Browse transaction is the source of non-linearly in the CPU utilization. (3) Estimation of the CPU utilization from that of each transaction measured separately incurs large errors (up to 65%), while the errors in the estimation of the power consumption are relatively small (up to 4%).","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCM-Based Durable Write Cache for Fast Disk I/O 基于pcm的快速磁盘I/O持久写缓存
Zhuo Liu, Bin Wang, Patrick Carpenter, Dong Li, J. Vetter, Weikuan Yu
Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCMbased caches, and develop a novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.
为了提高基于硬盘驱动器(HDD)的存储系统的性能,在内存层次中采用了基于Flash的固态设备fssd (solid-state devices)。然而,随着存储级存储器的快速发展,具有比fssd更好性能和更高写入持久性的新存储技术不断涌现,如相变存储器(PCM)。了解如何为现代计算系统利用这些最先进的存储技术对于解决具有挑战性的数据密集型计算问题非常重要。在本文中,我们建议将PCM用于混合PCM- hdd存储体系结构。我们发现了传统LRU缓存算法在基于pcm的缓存中的局限性,并开发了一种新的基于哈希的写缓存方案HALO,以提高硬盘的随机写性能。为了解决PCM设备的有限耐用性和解决传统磨损均衡技术中空间局部性下降的问题,我们进一步提出了新的PCM管理算法,在提供有效的磨损均衡的同时最大化访问并行性。我们使用具有多种I/O访问模式的应用程序评估了这种基于pcm的混合存储体系结构。我们的实验结果表明,与LRU缓存方案相比,HALO缓存方案的执行时间平均减少了36.8%,SFC磨损均衡使PCM的寿命延长了21.6倍。
{"title":"PCM-Based Durable Write Cache for Fast Disk I/O","authors":"Zhuo Liu, Bin Wang, Patrick Carpenter, Dong Li, J. Vetter, Weikuan Yu","doi":"10.1109/MASCOTS.2012.57","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.57","url":null,"abstract":"Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCMbased caches, and develop a novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122346071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Emulating a Shingled Write Disk 模拟一个带状写磁盘
Rekha Pitchumani, A. Hospodor, A. Amer, Yangwook Kang, E. L. Miller, D. Long
Shingled Magnetic Recording technology is expected to play a major role in the next generation of hard disk drives. But it introduces some unique challenges to system software researchers and prototype hardware is not readily available for the broader research community. It is crucial to work on system software in parallel to hardware manufacturing, to ensure successful and effective adoption of this technology. In this work, we present a novel Shingled Write Disk (SWD) emulator that uses a hard disk utilizing traditional Perpendicular Magnetic Recording (PMR) and emulates a Shingled Write Disk on top of it. We implemented the emulator as a pseudo block device driver and evaluated the performance overhead incurred by employing the emulator. The emulator has a slight overhead which is only measurable during pure sequential reads and writes. The moment disk head movement comes into picture, due to any random access, the emulator overhead becomes so insignificant as to become immeasurable.
瓦片磁记录技术有望在下一代硬盘驱动器中发挥重要作用。但是它给系统软件研究人员带来了一些独特的挑战,并且原型硬件并不容易为更广泛的研究社区所使用。为了确保该技术的成功和有效的采用,在硬件制造的同时进行系统软件的开发是至关重要的。在这项工作中,我们提出了一种新的Shingled写入磁盘(SWD)模拟器,该模拟器使用使用传统垂直磁记录(PMR)的硬盘,并在其上模拟一个Shingled写入磁盘。我们将模拟器实现为一个伪块设备驱动程序,并评估了使用模拟器所产生的性能开销。仿真器有一个轻微的开销,只有在纯顺序读写时才可以测量。当磁盘磁头运动进入图像时,由于任何随机访问,仿真器开销变得微不足道,以至于无法测量。
{"title":"Emulating a Shingled Write Disk","authors":"Rekha Pitchumani, A. Hospodor, A. Amer, Yangwook Kang, E. L. Miller, D. Long","doi":"10.1109/MASCOTS.2012.46","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.46","url":null,"abstract":"Shingled Magnetic Recording technology is expected to play a major role in the next generation of hard disk drives. But it introduces some unique challenges to system software researchers and prototype hardware is not readily available for the broader research community. It is crucial to work on system software in parallel to hardware manufacturing, to ensure successful and effective adoption of this technology. In this work, we present a novel Shingled Write Disk (SWD) emulator that uses a hard disk utilizing traditional Perpendicular Magnetic Recording (PMR) and emulates a Shingled Write Disk on top of it. We implemented the emulator as a pseudo block device driver and evaluated the performance overhead incurred by employing the emulator. The emulator has a slight overhead which is only measurable during pure sequential reads and writes. The moment disk head movement comes into picture, due to any random access, the emulator overhead becomes so insignificant as to become immeasurable.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128678583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Accelerating Multi-threaded Application Simulation through Barrier-Interval Time-Parallelism 通过Barrier-Interval时间并行性加速多线程应用程序仿真
Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, T. Conte
In the last decade, the microprocessor industry has undergone a dramatic change, ushering in the new era of multi-/manycore processors. As new designs incorporate increasing core counts, simulation technology has not matched pace, resulting in simulation times that increasingly dominate the design cycle. Complexities associated with the execution of code and communication between simulated cores has presented new obstacles for the simulation of manycore designs. Hence, many techniques developed to accelerate uniprocessor simulation cannot be easily adapted to accelerate manycore simulation. In this work, a novel time-parallel barrier-interval simulation methodology is presented to rapidly accelerate the simulation of certain classes of multi-threaded workloads. A program delineated into intervals by barriers may be accurately simulated in parallel. This approach avoids challenges originating from unknown thread progressions, since the program location of each executing thread is known. For the workloads tested, wall-clock speedups range from 1.22× to 596×, with an average of 13.94×. Furthermore, this approach allows the estimation of stable performance metrics such as cycle counts with minimal losses in accuracy (2%, on average, for all tested workloads). The proposed technique provides a fast and accurate mechanism to rapidly accelerate particular classes of manycore simulations.
在过去的十年中,微处理器行业发生了巨大的变化,迎来了多核/多核处理器的新时代。随着新设计包含越来越多的核心数量,仿真技术跟不上步伐,导致仿真时间日益主导设计周期。代码执行和仿真核之间通信的复杂性给多核设计的仿真带来了新的障碍。因此,许多为加速单处理器仿真而开发的技术不容易适用于加速多核仿真。在这项工作中,提出了一种新的时间并行屏障间隔模拟方法,以快速加速某些类别的多线程工作负载的模拟。用障碍划分成间隔的程序可以精确地并行模拟。这种方法避免了来自未知线程进程的挑战,因为每个执行线程的程序位置是已知的。对于测试的工作负载,时钟加速范围从1.22倍到596倍,平均为13.94倍。此外,这种方法允许估计稳定的性能指标,例如周期计数,并且准确度损失最小(对于所有测试的工作负载,平均损失为2%)。该技术提供了一种快速准确的机制来快速加速特定类别的多核仿真。
{"title":"Accelerating Multi-threaded Application Simulation through Barrier-Interval Time-Parallelism","authors":"Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, T. Conte","doi":"10.1109/MASCOTS.2012.23","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.23","url":null,"abstract":"In the last decade, the microprocessor industry has undergone a dramatic change, ushering in the new era of multi-/manycore processors. As new designs incorporate increasing core counts, simulation technology has not matched pace, resulting in simulation times that increasingly dominate the design cycle. Complexities associated with the execution of code and communication between simulated cores has presented new obstacles for the simulation of manycore designs. Hence, many techniques developed to accelerate uniprocessor simulation cannot be easily adapted to accelerate manycore simulation. In this work, a novel time-parallel barrier-interval simulation methodology is presented to rapidly accelerate the simulation of certain classes of multi-threaded workloads. A program delineated into intervals by barriers may be accurately simulated in parallel. This approach avoids challenges originating from unknown thread progressions, since the program location of each executing thread is known. For the workloads tested, wall-clock speedups range from 1.22× to 596×, with an average of 13.94×. Furthermore, this approach allows the estimation of stable performance metrics such as cycle counts with minimal losses in accuracy (2%, on average, for all tested workloads). The proposed technique provides a fast and accurate mechanism to rapidly accelerate particular classes of manycore simulations.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128014302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fair Scheduling on Parallel Bonded Channels with Intersecting Bonding Groups 具有交叉键组的平行键通道的公平调度
Gongbing Hong, James J. Martin, S. Moser, J. Westall
We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data Over Cable Service Interface Specification (DOCSIS). Our technique extends Golestani's self-clocked fair queuing algorithm (SCFQ). We illustrate its weighted fair-sharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin do not extend equally easily and effectively to this environment.
我们描述了一种有效的调度技术,用于在具有并行键合通道的网络中提供加权的总容量共享,其中单个通道可能同时是多个键合组的成员。我们的工作的动机是将此功能引入到电缆上的数据服务接口规范(DOCSIS)的版本3中。我们的技术扩展了Golestani的自时钟公平排队算法(SCFQ)。通过仿真说明了其加权公平共享特性,并给出了在一定条件下建立公平性的分析结果。我们还证明了基于轮询的技术(如加权赤字轮询)不能同样容易和有效地扩展到这种环境。
{"title":"Fair Scheduling on Parallel Bonded Channels with Intersecting Bonding Groups","authors":"Gongbing Hong, James J. Martin, S. Moser, J. Westall","doi":"10.1109/MASCOTS.2012.20","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.20","url":null,"abstract":"We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data Over Cable Service Interface Specification (DOCSIS). Our technique extends Golestani's self-clocked fair queuing algorithm (SCFQ). We illustrate its weighted fair-sharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin do not extend equally easily and effectively to this environment.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130803171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Making Write Less Blocking for Read Accesses in Phase Change Memory 在相变存储器中减少读访问的写阻塞
Jianhui Yue, Yifeng Zhu
Phase-change Memory (PCM) is a promising alternative or complement to DRAM for its non-volatility, scalable bit density, and fast read performance. Nevertheless, PCM has two serious challenges including extraordinarily slow write speed and less-than-desirable write endurance. While recent research has improved the write endurance significantly, slow write speed become a more prominent issue and prevents PCM from being widely used in real systems. To improve write speed, this paper proposes a new memory micro-architecture, called Parallel Chip PCM(PC2M), which leverages the spatial locality of memory accesses and trades bank-level parallelism for larger chip-level parallelism. We also present a micro-write scheme to reduce the blocking for read accesses caused by uninterrupted serialized writes. Micro-write breaks a large write into multiple smaller writes and timely schedules newly arriving reads immediately after a small write completes. Our design is orthogonal to many existing PCM write hiding techniques, and thus can be used to further optimize PCM performance. Based on simulation experiments of a multi-core processor under SPEC CPU 2006 multi-programmed workloads, our proposed techniques can reduce the memory latency of standard PCM by 68.5% and improve the system performance by 30.3% on average. PC2M and Micro-write significantly outperform existing approaches.
相变存储器(PCM)具有非易失性、可扩展的位密度和快速读取性能,是DRAM的一个很有前途的替代品或补充。然而,PCM有两个严重的挑战,包括非常慢的写入速度和不太理想的写入持久性。虽然近年来的研究已经显著提高了PCM的写入持久性,但写入速度慢的问题却越来越突出,阻碍了PCM在实际系统中的广泛应用。为了提高写入速度,本文提出了一种新的内存微架构,称为并行芯片PCM(PC2M),它利用内存访问的空间局部性,并将银行级并行性转换为更大的芯片级并行性。我们还提出了一种微写方案,以减少不间断串行写所造成的读访问阻塞。微写将一个大的写操作分解成多个小的写操作,并在小的写操作完成后立即及时调度新到达的读操作。我们的设计与许多现有的PCM写入隐藏技术是正交的,因此可以用于进一步优化PCM性能。基于多核处理器在SPEC CPU 2006多程序工作负载下的仿真实验,我们提出的技术可以将标准PCM的内存延迟降低68.5%,平均提高系统性能30.3%。PC2M和微写显著优于现有的方法。
{"title":"Making Write Less Blocking for Read Accesses in Phase Change Memory","authors":"Jianhui Yue, Yifeng Zhu","doi":"10.1109/MASCOTS.2012.39","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.39","url":null,"abstract":"Phase-change Memory (PCM) is a promising alternative or complement to DRAM for its non-volatility, scalable bit density, and fast read performance. Nevertheless, PCM has two serious challenges including extraordinarily slow write speed and less-than-desirable write endurance. While recent research has improved the write endurance significantly, slow write speed become a more prominent issue and prevents PCM from being widely used in real systems. To improve write speed, this paper proposes a new memory micro-architecture, called Parallel Chip PCM(PC2M), which leverages the spatial locality of memory accesses and trades bank-level parallelism for larger chip-level parallelism. We also present a micro-write scheme to reduce the blocking for read accesses caused by uninterrupted serialized writes. Micro-write breaks a large write into multiple smaller writes and timely schedules newly arriving reads immediately after a small write completes. Our design is orthogonal to many existing PCM write hiding techniques, and thus can be used to further optimize PCM performance. Based on simulation experiments of a multi-core processor under SPEC CPU 2006 multi-programmed workloads, our proposed techniques can reduce the memory latency of standard PCM by 68.5% and improve the system performance by 30.3% on average. PC2M and Micro-write significantly outperform existing approaches.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123516625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
MMPP Characterization of Web Application Traffic Web应用程序流量的MMPP特性
Ali Rajabi, J. Wong
Web application traffic has been shown to exhibit burstiness. The traditional model based on Poisson process is unable to capture the burstiness in traffic. On the other hand, the Markov-modulated Poisson process (MMPP) has been successfully used to model bursty traffic in a variety of computing environments. In this paper, we conduct experiments to investigate the effectiveness of MMPP as a traffic model in the context of resource provisioning in web applications. We first extend an available workload generator to produce a synthetic trace of job arrivals with controlled burstiness. We next consider an existing algorithm, as well as a variant of this algorithm, to fit an MMPP to the synthetic trace; each of them is used to obtain values for the MMPP parameters. The effectiveness of MMPP is then evaluated by comparing the performance results through simulation, using as input the synthetic trace and job arrivals generated by the estimated MMPP.
Web应用程序流量表现出突发性。基于泊松过程的传统模型无法捕捉到流量的突发性。另一方面,马尔可夫调制泊松过程(MMPP)已被成功地用于模拟各种计算环境中的突发流量。在本文中,我们通过实验来研究MMPP作为一种流量模型在web应用程序资源配置环境中的有效性。我们首先扩展一个可用的工作负载生成器,以生成一个受控突发的工作到达的合成跟踪。接下来,我们考虑一个现有的算法,以及该算法的一个变体,将MMPP拟合到合成轨迹;每个参数用于获取MMPP参数的值。然后通过比较模拟的性能结果来评估MMPP的有效性,使用估计的MMPP产生的合成跟踪和工作到达作为输入。
{"title":"MMPP Characterization of Web Application Traffic","authors":"Ali Rajabi, J. Wong","doi":"10.1109/MASCOTS.2012.22","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.22","url":null,"abstract":"Web application traffic has been shown to exhibit burstiness. The traditional model based on Poisson process is unable to capture the burstiness in traffic. On the other hand, the Markov-modulated Poisson process (MMPP) has been successfully used to model bursty traffic in a variety of computing environments. In this paper, we conduct experiments to investigate the effectiveness of MMPP as a traffic model in the context of resource provisioning in web applications. We first extend an available workload generator to produce a synthetic trace of job arrivals with controlled burstiness. We next consider an existing algorithm, as well as a variant of this algorithm, to fit an MMPP to the synthetic trace; each of them is used to obtain values for the MMPP parameters. The effectiveness of MMPP is then evaluated by comparing the performance results through simulation, using as input the synthetic trace and job arrivals generated by the estimated MMPP.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127246010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Coordinated VM Resizing and Server Tuning: Throughput, Power Efficiency and Scalability 协调VM调整大小和服务器调优:吞吐量,电源效率和可伸缩性
Yanfei Guo, Xiaobo Zhou
Performance control and power management in virtualized machines (VM) are two major research issues in modern data centers. They are challenging due to complexities of hosted Internet applications, high dynamics in workloads and the shared virtualized infrastructure. Obtaining a model among VM capacity, server configuration, performance and power consumption is a very hard problem even for just one application. In this paper, we propose and develop GARL, a genetic algorithm with multi-agent reinforcement learning approach for coordinated VM resizing and server tuning. In GARL, model-independent reinforcement learning agents generate VM capacity and server configuration options and the genetic algorithm evaluates different combinations of those options for maximizing a global utilization function of system throughput and power efficiency. The multi-agent design makes GARL a scalable approach, which is important as more and more applications are hosted in data centers using cloud services. We build a testbed in a prototype data center and deploy multiple RUBiS benchmark applications. We apply a power budget in the testbed and observe superior system throughput and power efficiency of GARL. Experimental results also find that GARL significantly outperforms a representative reinforcement learning based approach in performance control. GARL shows better scalability when compared to a centralized approach.
虚拟机的性能控制和电源管理是现代数据中心研究的两个主要问题。由于托管Internet应用程序的复杂性、工作负载的高动态和共享的虚拟化基础设施,它们具有挑战性。即使对于一个应用程序,获得VM容量、服务器配置、性能和功耗之间的模型也是一个非常困难的问题。在本文中,我们提出并发展了GARL,一种具有多智能体强化学习方法的遗传算法,用于协调VM调整大小和服务器调优。在GARL中,模型无关的强化学习代理生成VM容量和服务器配置选项,遗传算法评估这些选项的不同组合,以最大化系统吞吐量和功率效率的全局利用率函数。多代理设计使GARL成为一种可伸缩的方法,随着越来越多的应用程序使用云服务托管在数据中心中,这一点非常重要。我们在原型数据中心中构建了一个测试平台,并部署了多个RUBiS基准应用程序。我们在测试台上应用了功率预算,观察到GARL具有优异的系统吞吐量和功率效率。实验结果还发现,GARL在性能控制方面明显优于具有代表性的基于强化学习的方法。与集中式方法相比,GARL具有更好的可伸缩性。
{"title":"Coordinated VM Resizing and Server Tuning: Throughput, Power Efficiency and Scalability","authors":"Yanfei Guo, Xiaobo Zhou","doi":"10.1109/MASCOTS.2012.41","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.41","url":null,"abstract":"Performance control and power management in virtualized machines (VM) are two major research issues in modern data centers. They are challenging due to complexities of hosted Internet applications, high dynamics in workloads and the shared virtualized infrastructure. Obtaining a model among VM capacity, server configuration, performance and power consumption is a very hard problem even for just one application. In this paper, we propose and develop GARL, a genetic algorithm with multi-agent reinforcement learning approach for coordinated VM resizing and server tuning. In GARL, model-independent reinforcement learning agents generate VM capacity and server configuration options and the genetic algorithm evaluates different combinations of those options for maximizing a global utilization function of system throughput and power efficiency. The multi-agent design makes GARL a scalable approach, which is important as more and more applications are hosted in data centers using cloud services. We build a testbed in a prototype data center and deploy multiple RUBiS benchmark applications. We apply a power budget in the testbed and observe superior system throughput and power efficiency of GARL. Experimental results also find that GARL significantly outperforms a representative reinforcement learning based approach in performance control. GARL shows better scalability when compared to a centralized approach.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130206796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
OpenAirInterface Traffic Generator (OTG): A Realistic Traffic Generation Tool for Emerging Application Scenarios OpenAirInterface流量生成器(OTG):新兴应用场景的现实流量生成工具
A. Hafsaoui, N. Nikaein, Lusheng Wang
Traffic generation represents one of the main challenge in modeling and simulating the application and network load. In this work, we present a tool, called OpenAirInterface Traffic Generator (OTG), for the generation of realistic application traffic that can be used for testing and evaluating the performance of emerging networking architectures. In addition to the traffic of conventional applications, OTG is capable of accurately emulating the traffic of new application scenarios such as online gaming and machine-type communication. To highlight the capability and new features of the tool, the one-way delay of OpenArena online gaming application in the presence of the background traffic is analyzed over the LTE network using OpenAirInterface emulation platform.
流量生成是应用程序和网络负载建模和模拟的主要挑战之一。在这项工作中,我们提出了一个工具,称为OpenAirInterface流量生成器(OTG),用于生成现实应用流量,可用于测试和评估新兴网络架构的性能。除了常规应用的流量外,OTG还能准确模拟网络游戏、机机通信等新型应用场景的流量。为了突出该工具的功能和新特性,利用OpenAirInterface仿真平台,在LTE网络上分析了OpenArena在线游戏应用在存在后台流量的情况下的单向延迟。
{"title":"OpenAirInterface Traffic Generator (OTG): A Realistic Traffic Generation Tool for Emerging Application Scenarios","authors":"A. Hafsaoui, N. Nikaein, Lusheng Wang","doi":"10.1109/MASCOTS.2012.62","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.62","url":null,"abstract":"Traffic generation represents one of the main challenge in modeling and simulating the application and network load. In this work, we present a tool, called OpenAirInterface Traffic Generator (OTG), for the generation of realistic application traffic that can be used for testing and evaluating the performance of emerging networking architectures. In addition to the traffic of conventional applications, OTG is capable of accurately emulating the traffic of new application scenarios such as online gaming and machine-type communication. To highlight the capability and new features of the tool, the one-way delay of OpenArena online gaming application in the presence of the background traffic is analyzed over the LTE network using OpenAirInterface emulation platform.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130632360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Frequency Affinity: Analyzing and Maximizing Power Efficiency in Multi-core Systems 频率亲和性:分析和最大化多核系统的功率效率
Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu
Performance optimization and energy efficiency are the major challenges in multi-core system design. Of the state-of-the-art approaches, cache affinity aware scheduling and techniques based on dynamic voltage frequency scaling (DVFS) are widely applied to improve performance and save energy consumptions respectively. In modern operating systems, schedulers exploit high cache affinity by allocating a process on a recently used processor whenever possible. When a process runs on a high-affinity processor it will find most of its states already in the cache and will thus achieve more efficiency. However, most state-of-the-art DVFS techniques do not concentrate on the cost analysis for DVFS mechanism. In this paper, we firstly propose frequency affinity which retains the voltage frequency as long as possible to avoid frequently switching, and then present a frequency affinity aware scheduling (FAS) to maximize power efficiency for multi-core systems. Experimental results demonstrate our frequency affinity aware scheduling algorithms are much more power efficient than single-ISA heterogeneous multi-core processors.
性能优化和节能是多核系统设计面临的主要挑战。其中,缓存关联感知调度和基于动态电压频率缩放(DVFS)的技术分别被广泛应用于提高性能和节省能耗。在现代操作系统中,调度器通过尽可能在最近使用的处理器上分配进程来利用高缓存亲缘性。当一个进程在高关联处理器上运行时,它会发现它的大部分状态已经在缓存中,因此会获得更高的效率。然而,目前最先进的DVFS技术并没有对DVFS机制进行成本分析。本文首先提出了尽可能长时间保持电压频率以避免频繁切换的频率亲和性,然后提出了频率亲和性感知调度(FAS)以最大限度地提高多核系统的功率效率。实验结果表明,我们的频率亲和性感知调度算法比单isa异构多核处理器更节能。
{"title":"Frequency Affinity: Analyzing and Maximizing Power Efficiency in Multi-core Systems","authors":"Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu","doi":"10.1109/MASCOTS.2012.63","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.63","url":null,"abstract":"Performance optimization and energy efficiency are the major challenges in multi-core system design. Of the state-of-the-art approaches, cache affinity aware scheduling and techniques based on dynamic voltage frequency scaling (DVFS) are widely applied to improve performance and save energy consumptions respectively. In modern operating systems, schedulers exploit high cache affinity by allocating a process on a recently used processor whenever possible. When a process runs on a high-affinity processor it will find most of its states already in the cache and will thus achieve more efficiency. However, most state-of-the-art DVFS techniques do not concentrate on the cost analysis for DVFS mechanism. In this paper, we firstly propose frequency affinity which retains the voltage frequency as long as possible to avoid frequently switching, and then present a frequency affinity aware scheduling (FAS) to maximize power efficiency for multi-core systems. Experimental results demonstrate our frequency affinity aware scheduling algorithms are much more power efficient than single-ISA heterogeneous multi-core processors.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132444897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1