一个跟踪驱动的仿真框架，用于预测存在操作系统抖动的大型集群的可伸缩性

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI:10.1109/CLUSTR.2008.4663776

Pradipta De, Ravina Kothari, V. Mann

{"title":"一个跟踪驱动的仿真框架，用于预测存在操作系统抖动的大型集群的可伸缩性","authors":"Pradipta De, Ravina Kothari, V. Mann","doi":"10.1109/CLUSTR.2008.4663776","DOIUrl":null,"url":null,"abstract":"Various studies have pointed out the debilitating effects of OS jitter on the performance of parallel applications on large clusters such as the ASCI Purple and the Mare Nostrum at Barcelona Supercomputing Center. These clusters use commodity OSes such as AIX and Linux respectively. The biggest hindrance in evaluating any technique to mitigate jitter is getting access to such large scale production HPC systems running a commodity OS. An earlier attempt aimed at solving this problem was to emulate the effects of OS jitter on more widely available and jitter-free systems such as BlueGene/L. In this paper, we point out the shortcomings of previous such approaches and present the design and implementation of an emulation framework that helps overcome those shortcomings by using innovative techniques. We collect jitter traces on a commodity OS with a given configuration, under which we want to study the scaling behavior. These traces are then replayed on a jitter-free system to predict scalability in presence of OS jitter. The application of this emulation framework to predict scalability is illustrated through a comparative scalability study of an off-the-shelf Linux distribution with a minimal configuration (runlevel 1) and a highly optimized embedded Linux distribution, running on the IO nodes of BlueGene/L. We validate the results of our emulation both on a single node as well as on a real cluster. Our results indicate that an optimized OS along with a technique to synchronize jitter can reduce the performance degradation due to jitter from 99% (in case of the off-the-shelf Linux without any synchronization) to a much more tolerable level of 6% (in case of highly optimized BlueGene/L IO node Linux with synchronization) at 2048 processors. Furthermore, perfect synchronization can give linear scaling with less than 1% slowdown, regardless of the type of OS used. However, as the jitter at different nodes starts getting desynchronized, even with a minor skew across nodes, the optimized OS starts outperforming the off-the-shelf OS.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A trace-driven emulation framework to predict scalability of large clusters in presence of OS Jitter\",\"authors\":\"Pradipta De, Ravina Kothari, V. Mann\",\"doi\":\"10.1109/CLUSTR.2008.4663776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Various studies have pointed out the debilitating effects of OS jitter on the performance of parallel applications on large clusters such as the ASCI Purple and the Mare Nostrum at Barcelona Supercomputing Center. These clusters use commodity OSes such as AIX and Linux respectively. The biggest hindrance in evaluating any technique to mitigate jitter is getting access to such large scale production HPC systems running a commodity OS. An earlier attempt aimed at solving this problem was to emulate the effects of OS jitter on more widely available and jitter-free systems such as BlueGene/L. In this paper, we point out the shortcomings of previous such approaches and present the design and implementation of an emulation framework that helps overcome those shortcomings by using innovative techniques. We collect jitter traces on a commodity OS with a given configuration, under which we want to study the scaling behavior. These traces are then replayed on a jitter-free system to predict scalability in presence of OS jitter. The application of this emulation framework to predict scalability is illustrated through a comparative scalability study of an off-the-shelf Linux distribution with a minimal configuration (runlevel 1) and a highly optimized embedded Linux distribution, running on the IO nodes of BlueGene/L. We validate the results of our emulation both on a single node as well as on a real cluster. Our results indicate that an optimized OS along with a technique to synchronize jitter can reduce the performance degradation due to jitter from 99% (in case of the off-the-shelf Linux without any synchronization) to a much more tolerable level of 6% (in case of highly optimized BlueGene/L IO node Linux with synchronization) at 2048 processors. Furthermore, perfect synchronization can give linear scaling with less than 1% slowdown, regardless of the type of OS used. However, as the jitter at different nodes starts getting desynchronized, even with a minor skew across nodes, the optimized OS starts outperforming the off-the-shelf OS.\",\"PeriodicalId\":198768,\"journal\":{\"name\":\"2008 IEEE International Conference on Cluster Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2008.4663776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2008.4663776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

各种研究都指出了操作系统抖动对大型集群(如巴塞罗那超级计算中心的ASCI Purple和Mare Nostrum)上并行应用程序性能的削弱作用。这些集群分别使用商用操作系统，如AIX和Linux。评估任何减轻抖动的技术的最大障碍是如何访问运行商用操作系统的大规模生产HPC系统。解决这个问题的早期尝试是在更广泛可用和无抖动的系统(如BlueGene/L)上模拟操作系统抖动的影响。在本文中，我们指出了以前这种方法的缺点，并提出了一个仿真框架的设计和实现，该框架通过使用创新技术来帮助克服这些缺点。我们收集具有给定配置的商品操作系统上的抖动痕迹，在此情况下我们想要研究缩放行为。然后在无抖动的系统上重播这些跟踪，以预测存在操作系统抖动的可伸缩性。该仿真框架用于预测可伸缩性的应用程序通过对具有最小配置(运行级1)的现成Linux发行版和在BlueGene/L的IO节点上运行的高度优化的嵌入式Linux发行版的比较可伸缩性研究来说明。我们在单个节点和实际集群上验证了仿真结果。我们的结果表明，优化的操作系统以及同步抖动技术可以在2048个处理器下将抖动从99%(在没有任何同步的现成Linux的情况下)减少到更可容忍的6%(在高度优化的BlueGene/L IO节点Linux具有同步的情况下)的性能下降。此外，无论使用何种类型的操作系统，完美的同步都可以在小于1%的速度下实现线性扩展。然而，当不同节点上的抖动开始变得不同步时，即使节点之间有轻微的倾斜，优化后的操作系统开始优于现成的操作系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A trace-driven emulation framework to predict scalability of large clusters in presence of OS Jitter

Various studies have pointed out the debilitating effects of OS jitter on the performance of parallel applications on large clusters such as the ASCI Purple and the Mare Nostrum at Barcelona Supercomputing Center. These clusters use commodity OSes such as AIX and Linux respectively. The biggest hindrance in evaluating any technique to mitigate jitter is getting access to such large scale production HPC systems running a commodity OS. An earlier attempt aimed at solving this problem was to emulate the effects of OS jitter on more widely available and jitter-free systems such as BlueGene/L. In this paper, we point out the shortcomings of previous such approaches and present the design and implementation of an emulation framework that helps overcome those shortcomings by using innovative techniques. We collect jitter traces on a commodity OS with a given configuration, under which we want to study the scaling behavior. These traces are then replayed on a jitter-free system to predict scalability in presence of OS jitter. The application of this emulation framework to predict scalability is illustrated through a comparative scalability study of an off-the-shelf Linux distribution with a minimal configuration (runlevel 1) and a highly optimized embedded Linux distribution, running on the IO nodes of BlueGene/L. We validate the results of our emulation both on a single node as well as on a real cluster. Our results indicate that an optimized OS along with a technique to synchronize jitter can reduce the performance degradation due to jitter from 99% (in case of the off-the-shelf Linux without any synchronization) to a much more tolerable level of 6% (in case of highly optimized BlueGene/L IO node Linux with synchronization) at 2048 processors. Furthermore, perfect synchronization can give linear scaling with less than 1% slowdown, regardless of the type of OS used. However, as the jitter at different nodes starts getting desynchronized, even with a minor skew across nodes, the optimized OS starts outperforming the off-the-shelf OS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Load-balancing methods for parallel and distributed constraint solving Exploiting data compression in collective I/O techniques High message rate, NIC-based atomics: Design and performance considerations Impact of topology and link aggregation on a PC cluster with Ethernet Active storage using object-based devices