In this paper, we present our in-progress project of modeling performance and power consumption of Java application servers using SPECjEnterprise2010. We run the workload on two application server using two different CPUs, AMD Phenom II and Intel Atom, and investigate performance and power consumption behaviors against the increasing system sizes. We have observed that: (1) CPU utilization draws non-linear functions of the system size and their shapes are different on Phenom and Atom. However, power consumption on both servers increase proportionally. (2) Browse transaction is the source of non-linearly in the CPU utilization. (3) Estimation of the CPU utilization from that of each transaction measured separately incurs large errors (up to 65%), while the errors in the estimation of the power consumption are relatively small (up to 4%).
{"title":"Performance and Power Consumption Measurement of Java Application Servers","authors":"Hitoshi Oi, Sho Niboshi","doi":"10.1109/MASCOTS.2012.68","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.68","url":null,"abstract":"In this paper, we present our in-progress project of modeling performance and power consumption of Java application servers using SPECjEnterprise2010. We run the workload on two application server using two different CPUs, AMD Phenom II and Intel Atom, and investigate performance and power consumption behaviors against the increasing system sizes. We have observed that: (1) CPU utilization draws non-linear functions of the system size and their shapes are different on Phenom and Atom. However, power consumption on both servers increase proportionally. (2) Browse transaction is the source of non-linearly in the CPU utilization. (3) Estimation of the CPU utilization from that of each transaction measured separately incurs large errors (up to 65%), while the errors in the estimation of the power consumption are relatively small (up to 4%).","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuo Liu, Bin Wang, Patrick Carpenter, Dong Li, J. Vetter, Weikuan Yu
Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCMbased caches, and develop a novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.
{"title":"PCM-Based Durable Write Cache for Fast Disk I/O","authors":"Zhuo Liu, Bin Wang, Patrick Carpenter, Dong Li, J. Vetter, Weikuan Yu","doi":"10.1109/MASCOTS.2012.57","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.57","url":null,"abstract":"Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCMbased caches, and develop a novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122346071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rekha Pitchumani, A. Hospodor, A. Amer, Yangwook Kang, E. L. Miller, D. Long
Shingled Magnetic Recording technology is expected to play a major role in the next generation of hard disk drives. But it introduces some unique challenges to system software researchers and prototype hardware is not readily available for the broader research community. It is crucial to work on system software in parallel to hardware manufacturing, to ensure successful and effective adoption of this technology. In this work, we present a novel Shingled Write Disk (SWD) emulator that uses a hard disk utilizing traditional Perpendicular Magnetic Recording (PMR) and emulates a Shingled Write Disk on top of it. We implemented the emulator as a pseudo block device driver and evaluated the performance overhead incurred by employing the emulator. The emulator has a slight overhead which is only measurable during pure sequential reads and writes. The moment disk head movement comes into picture, due to any random access, the emulator overhead becomes so insignificant as to become immeasurable.
{"title":"Emulating a Shingled Write Disk","authors":"Rekha Pitchumani, A. Hospodor, A. Amer, Yangwook Kang, E. L. Miller, D. Long","doi":"10.1109/MASCOTS.2012.46","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.46","url":null,"abstract":"Shingled Magnetic Recording technology is expected to play a major role in the next generation of hard disk drives. But it introduces some unique challenges to system software researchers and prototype hardware is not readily available for the broader research community. It is crucial to work on system software in parallel to hardware manufacturing, to ensure successful and effective adoption of this technology. In this work, we present a novel Shingled Write Disk (SWD) emulator that uses a hard disk utilizing traditional Perpendicular Magnetic Recording (PMR) and emulates a Shingled Write Disk on top of it. We implemented the emulator as a pseudo block device driver and evaluated the performance overhead incurred by employing the emulator. The emulator has a slight overhead which is only measurable during pure sequential reads and writes. The moment disk head movement comes into picture, due to any random access, the emulator overhead becomes so insignificant as to become immeasurable.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128678583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, T. Conte
In the last decade, the microprocessor industry has undergone a dramatic change, ushering in the new era of multi-/manycore processors. As new designs incorporate increasing core counts, simulation technology has not matched pace, resulting in simulation times that increasingly dominate the design cycle. Complexities associated with the execution of code and communication between simulated cores has presented new obstacles for the simulation of manycore designs. Hence, many techniques developed to accelerate uniprocessor simulation cannot be easily adapted to accelerate manycore simulation. In this work, a novel time-parallel barrier-interval simulation methodology is presented to rapidly accelerate the simulation of certain classes of multi-threaded workloads. A program delineated into intervals by barriers may be accurately simulated in parallel. This approach avoids challenges originating from unknown thread progressions, since the program location of each executing thread is known. For the workloads tested, wall-clock speedups range from 1.22× to 596×, with an average of 13.94×. Furthermore, this approach allows the estimation of stable performance metrics such as cycle counts with minimal losses in accuracy (2%, on average, for all tested workloads). The proposed technique provides a fast and accurate mechanism to rapidly accelerate particular classes of manycore simulations.
{"title":"Accelerating Multi-threaded Application Simulation through Barrier-Interval Time-Parallelism","authors":"Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, T. Conte","doi":"10.1109/MASCOTS.2012.23","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.23","url":null,"abstract":"In the last decade, the microprocessor industry has undergone a dramatic change, ushering in the new era of multi-/manycore processors. As new designs incorporate increasing core counts, simulation technology has not matched pace, resulting in simulation times that increasingly dominate the design cycle. Complexities associated with the execution of code and communication between simulated cores has presented new obstacles for the simulation of manycore designs. Hence, many techniques developed to accelerate uniprocessor simulation cannot be easily adapted to accelerate manycore simulation. In this work, a novel time-parallel barrier-interval simulation methodology is presented to rapidly accelerate the simulation of certain classes of multi-threaded workloads. A program delineated into intervals by barriers may be accurately simulated in parallel. This approach avoids challenges originating from unknown thread progressions, since the program location of each executing thread is known. For the workloads tested, wall-clock speedups range from 1.22× to 596×, with an average of 13.94×. Furthermore, this approach allows the estimation of stable performance metrics such as cycle counts with minimal losses in accuracy (2%, on average, for all tested workloads). The proposed technique provides a fast and accurate mechanism to rapidly accelerate particular classes of manycore simulations.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128014302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gongbing Hong, James J. Martin, S. Moser, J. Westall
We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data Over Cable Service Interface Specification (DOCSIS). Our technique extends Golestani's self-clocked fair queuing algorithm (SCFQ). We illustrate its weighted fair-sharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin do not extend equally easily and effectively to this environment.
{"title":"Fair Scheduling on Parallel Bonded Channels with Intersecting Bonding Groups","authors":"Gongbing Hong, James J. Martin, S. Moser, J. Westall","doi":"10.1109/MASCOTS.2012.20","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.20","url":null,"abstract":"We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data Over Cable Service Interface Specification (DOCSIS). Our technique extends Golestani's self-clocked fair queuing algorithm (SCFQ). We illustrate its weighted fair-sharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin do not extend equally easily and effectively to this environment.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130803171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase-change Memory (PCM) is a promising alternative or complement to DRAM for its non-volatility, scalable bit density, and fast read performance. Nevertheless, PCM has two serious challenges including extraordinarily slow write speed and less-than-desirable write endurance. While recent research has improved the write endurance significantly, slow write speed become a more prominent issue and prevents PCM from being widely used in real systems. To improve write speed, this paper proposes a new memory micro-architecture, called Parallel Chip PCM(PC2M), which leverages the spatial locality of memory accesses and trades bank-level parallelism for larger chip-level parallelism. We also present a micro-write scheme to reduce the blocking for read accesses caused by uninterrupted serialized writes. Micro-write breaks a large write into multiple smaller writes and timely schedules newly arriving reads immediately after a small write completes. Our design is orthogonal to many existing PCM write hiding techniques, and thus can be used to further optimize PCM performance. Based on simulation experiments of a multi-core processor under SPEC CPU 2006 multi-programmed workloads, our proposed techniques can reduce the memory latency of standard PCM by 68.5% and improve the system performance by 30.3% on average. PC2M and Micro-write significantly outperform existing approaches.
相变存储器(PCM)具有非易失性、可扩展的位密度和快速读取性能,是DRAM的一个很有前途的替代品或补充。然而,PCM有两个严重的挑战,包括非常慢的写入速度和不太理想的写入持久性。虽然近年来的研究已经显著提高了PCM的写入持久性,但写入速度慢的问题却越来越突出,阻碍了PCM在实际系统中的广泛应用。为了提高写入速度,本文提出了一种新的内存微架构,称为并行芯片PCM(PC2M),它利用内存访问的空间局部性,并将银行级并行性转换为更大的芯片级并行性。我们还提出了一种微写方案,以减少不间断串行写所造成的读访问阻塞。微写将一个大的写操作分解成多个小的写操作,并在小的写操作完成后立即及时调度新到达的读操作。我们的设计与许多现有的PCM写入隐藏技术是正交的,因此可以用于进一步优化PCM性能。基于多核处理器在SPEC CPU 2006多程序工作负载下的仿真实验,我们提出的技术可以将标准PCM的内存延迟降低68.5%,平均提高系统性能30.3%。PC2M和微写显著优于现有的方法。
{"title":"Making Write Less Blocking for Read Accesses in Phase Change Memory","authors":"Jianhui Yue, Yifeng Zhu","doi":"10.1109/MASCOTS.2012.39","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.39","url":null,"abstract":"Phase-change Memory (PCM) is a promising alternative or complement to DRAM for its non-volatility, scalable bit density, and fast read performance. Nevertheless, PCM has two serious challenges including extraordinarily slow write speed and less-than-desirable write endurance. While recent research has improved the write endurance significantly, slow write speed become a more prominent issue and prevents PCM from being widely used in real systems. To improve write speed, this paper proposes a new memory micro-architecture, called Parallel Chip PCM(PC2M), which leverages the spatial locality of memory accesses and trades bank-level parallelism for larger chip-level parallelism. We also present a micro-write scheme to reduce the blocking for read accesses caused by uninterrupted serialized writes. Micro-write breaks a large write into multiple smaller writes and timely schedules newly arriving reads immediately after a small write completes. Our design is orthogonal to many existing PCM write hiding techniques, and thus can be used to further optimize PCM performance. Based on simulation experiments of a multi-core processor under SPEC CPU 2006 multi-programmed workloads, our proposed techniques can reduce the memory latency of standard PCM by 68.5% and improve the system performance by 30.3% on average. PC2M and Micro-write significantly outperform existing approaches.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123516625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web application traffic has been shown to exhibit burstiness. The traditional model based on Poisson process is unable to capture the burstiness in traffic. On the other hand, the Markov-modulated Poisson process (MMPP) has been successfully used to model bursty traffic in a variety of computing environments. In this paper, we conduct experiments to investigate the effectiveness of MMPP as a traffic model in the context of resource provisioning in web applications. We first extend an available workload generator to produce a synthetic trace of job arrivals with controlled burstiness. We next consider an existing algorithm, as well as a variant of this algorithm, to fit an MMPP to the synthetic trace; each of them is used to obtain values for the MMPP parameters. The effectiveness of MMPP is then evaluated by comparing the performance results through simulation, using as input the synthetic trace and job arrivals generated by the estimated MMPP.
{"title":"MMPP Characterization of Web Application Traffic","authors":"Ali Rajabi, J. Wong","doi":"10.1109/MASCOTS.2012.22","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.22","url":null,"abstract":"Web application traffic has been shown to exhibit burstiness. The traditional model based on Poisson process is unable to capture the burstiness in traffic. On the other hand, the Markov-modulated Poisson process (MMPP) has been successfully used to model bursty traffic in a variety of computing environments. In this paper, we conduct experiments to investigate the effectiveness of MMPP as a traffic model in the context of resource provisioning in web applications. We first extend an available workload generator to produce a synthetic trace of job arrivals with controlled burstiness. We next consider an existing algorithm, as well as a variant of this algorithm, to fit an MMPP to the synthetic trace; each of them is used to obtain values for the MMPP parameters. The effectiveness of MMPP is then evaluated by comparing the performance results through simulation, using as input the synthetic trace and job arrivals generated by the estimated MMPP.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127246010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Performance control and power management in virtualized machines (VM) are two major research issues in modern data centers. They are challenging due to complexities of hosted Internet applications, high dynamics in workloads and the shared virtualized infrastructure. Obtaining a model among VM capacity, server configuration, performance and power consumption is a very hard problem even for just one application. In this paper, we propose and develop GARL, a genetic algorithm with multi-agent reinforcement learning approach for coordinated VM resizing and server tuning. In GARL, model-independent reinforcement learning agents generate VM capacity and server configuration options and the genetic algorithm evaluates different combinations of those options for maximizing a global utilization function of system throughput and power efficiency. The multi-agent design makes GARL a scalable approach, which is important as more and more applications are hosted in data centers using cloud services. We build a testbed in a prototype data center and deploy multiple RUBiS benchmark applications. We apply a power budget in the testbed and observe superior system throughput and power efficiency of GARL. Experimental results also find that GARL significantly outperforms a representative reinforcement learning based approach in performance control. GARL shows better scalability when compared to a centralized approach.
{"title":"Coordinated VM Resizing and Server Tuning: Throughput, Power Efficiency and Scalability","authors":"Yanfei Guo, Xiaobo Zhou","doi":"10.1109/MASCOTS.2012.41","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.41","url":null,"abstract":"Performance control and power management in virtualized machines (VM) are two major research issues in modern data centers. They are challenging due to complexities of hosted Internet applications, high dynamics in workloads and the shared virtualized infrastructure. Obtaining a model among VM capacity, server configuration, performance and power consumption is a very hard problem even for just one application. In this paper, we propose and develop GARL, a genetic algorithm with multi-agent reinforcement learning approach for coordinated VM resizing and server tuning. In GARL, model-independent reinforcement learning agents generate VM capacity and server configuration options and the genetic algorithm evaluates different combinations of those options for maximizing a global utilization function of system throughput and power efficiency. The multi-agent design makes GARL a scalable approach, which is important as more and more applications are hosted in data centers using cloud services. We build a testbed in a prototype data center and deploy multiple RUBiS benchmark applications. We apply a power budget in the testbed and observe superior system throughput and power efficiency of GARL. Experimental results also find that GARL significantly outperforms a representative reinforcement learning based approach in performance control. GARL shows better scalability when compared to a centralized approach.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130206796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traffic generation represents one of the main challenge in modeling and simulating the application and network load. In this work, we present a tool, called OpenAirInterface Traffic Generator (OTG), for the generation of realistic application traffic that can be used for testing and evaluating the performance of emerging networking architectures. In addition to the traffic of conventional applications, OTG is capable of accurately emulating the traffic of new application scenarios such as online gaming and machine-type communication. To highlight the capability and new features of the tool, the one-way delay of OpenArena online gaming application in the presence of the background traffic is analyzed over the LTE network using OpenAirInterface emulation platform.
{"title":"OpenAirInterface Traffic Generator (OTG): A Realistic Traffic Generation Tool for Emerging Application Scenarios","authors":"A. Hafsaoui, N. Nikaein, Lusheng Wang","doi":"10.1109/MASCOTS.2012.62","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.62","url":null,"abstract":"Traffic generation represents one of the main challenge in modeling and simulating the application and network load. In this work, we present a tool, called OpenAirInterface Traffic Generator (OTG), for the generation of realistic application traffic that can be used for testing and evaluating the performance of emerging networking architectures. In addition to the traffic of conventional applications, OTG is capable of accurately emulating the traffic of new application scenarios such as online gaming and machine-type communication. To highlight the capability and new features of the tool, the one-way delay of OpenArena online gaming application in the presence of the background traffic is analyzed over the LTE network using OpenAirInterface emulation platform.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130632360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu
Performance optimization and energy efficiency are the major challenges in multi-core system design. Of the state-of-the-art approaches, cache affinity aware scheduling and techniques based on dynamic voltage frequency scaling (DVFS) are widely applied to improve performance and save energy consumptions respectively. In modern operating systems, schedulers exploit high cache affinity by allocating a process on a recently used processor whenever possible. When a process runs on a high-affinity processor it will find most of its states already in the cache and will thus achieve more efficiency. However, most state-of-the-art DVFS techniques do not concentrate on the cost analysis for DVFS mechanism. In this paper, we firstly propose frequency affinity which retains the voltage frequency as long as possible to avoid frequently switching, and then present a frequency affinity aware scheduling (FAS) to maximize power efficiency for multi-core systems. Experimental results demonstrate our frequency affinity aware scheduling algorithms are much more power efficient than single-ISA heterogeneous multi-core processors.
{"title":"Frequency Affinity: Analyzing and Maximizing Power Efficiency in Multi-core Systems","authors":"Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu","doi":"10.1109/MASCOTS.2012.63","DOIUrl":"https://doi.org/10.1109/MASCOTS.2012.63","url":null,"abstract":"Performance optimization and energy efficiency are the major challenges in multi-core system design. Of the state-of-the-art approaches, cache affinity aware scheduling and techniques based on dynamic voltage frequency scaling (DVFS) are widely applied to improve performance and save energy consumptions respectively. In modern operating systems, schedulers exploit high cache affinity by allocating a process on a recently used processor whenever possible. When a process runs on a high-affinity processor it will find most of its states already in the cache and will thus achieve more efficiency. However, most state-of-the-art DVFS techniques do not concentrate on the cost analysis for DVFS mechanism. In this paper, we firstly propose frequency affinity which retains the voltage frequency as long as possible to avoid frequently switching, and then present a frequency affinity aware scheduling (FAS) to maximize power efficiency for multi-core systems. Experimental results demonstrate our frequency affinity aware scheduling algorithms are much more power efficient than single-ISA heterogeneous multi-core processors.","PeriodicalId":278764,"journal":{"name":"2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132444897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}