C. Carothers, J. Meredith, Mark P. Blanco, J. Vetter, M. Mubarak, Justin M. LaPre, S. Moore
Performance modeling of extreme-scale applications on accurate representations of potential architectures is critical for designing next generation supercomputing systems because it is impractical to construct prototype systems at scale with new network hardware in order to explore designs and policies. However, these simulations often rely on static application traces that can be difficult to work with because of their size and lack of flexibility to extend or scale up without rerunning the original application. To address this problem, we have created a new technique for generating scalable, flexible workloads from real applications, we have implemented a prototype, called Durango, that combines a proven analytical performance modeling language, Aspen, with the massively parallel HPC network modeling capabilities of the CODES framework. Our models are compact, parameterized and representative of real applications with computation events. They are not resource intensive to create and are portable across simulator environments. We demonstrate the utility of Durango by simulating the LULESH application in the CODES simulation environment on several topologies and show that Durango is practical to use for simulation without loss of fidelity, as quantified by simulation metrics. During our validation of Durango's generated communication model of LULESH, we found that the original LULESH miniapp code had a latent bug where the MPI_Waitall operation was used incorrectly. This finding underscores the potential need for a tool such as Durango, beyond its benefits for flexible workload generation and modeling. Additionally, we demonstrate the efficacy of Durango's direct integration approach, which links Aspen into CODES as part of the running network simulation model. Here, Aspen generates the application-level computation timing events, which in turn drive the start of a network communication phase. Results show that Durango's performance scales well when executing both torus and dragonfly network models on up to 4K Blue Gene/Q nodes using 32K MPI ranks, Durango also avoids the overheads and complexities associated with extreme-scale trace files.
基于潜在架构的精确表示的极端规模应用程序的性能建模对于设计下一代超级计算系统至关重要,因为为了探索设计和策略,使用新的网络硬件构建大规模原型系统是不切实际的。然而,这些模拟通常依赖于静态应用程序跟踪,这些跟踪很难处理,因为它们的大小和缺乏在不重新运行原始应用程序的情况下扩展或扩展的灵活性。为了解决这个问题,我们创建了一种新技术,用于从实际应用程序中生成可扩展的、灵活的工作负载,我们实现了一个名为Durango的原型,它结合了经过验证的分析性能建模语言Aspen和CODES框架的大规模并行HPC网络建模功能。我们的模型是紧凑的,参数化的,并且代表了具有计算事件的实际应用。它们的创建不需要耗费大量资源,并且可以跨模拟器环境移植。我们通过在几种拓扑结构上的CODES仿真环境中模拟LULESH应用程序来演示Durango的实用性,并表明Durango可用于仿真而不会损失保真度,并通过仿真指标进行量化。在我们验证Durango生成的LULESH通信模型期间,我们发现原来的LULESH miniapp代码有一个潜在的错误,其中MPI_Waitall操作被错误地使用。这一发现强调了对Durango这样的工具的潜在需求,除了它在灵活的工作负载生成和建模方面的好处之外。此外,我们还展示了Durango直接集成方法的有效性,该方法将Aspen连接到CODES中,作为运行网络仿真模型的一部分。在这里,Aspen生成应用程序级计算计时事件,这些事件反过来驱动网络通信阶段的开始。结果表明,当使用32K MPI等级在高达4K Blue Gene/Q节点上执行环面和蜻蜓网络模型时,Durango的性能可以很好地扩展,Durango还避免了与极端规模跟踪文件相关的开销和复杂性。
{"title":"Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation","authors":"C. Carothers, J. Meredith, Mark P. Blanco, J. Vetter, M. Mubarak, Justin M. LaPre, S. Moore","doi":"10.1145/3064911.3064923","DOIUrl":"https://doi.org/10.1145/3064911.3064923","url":null,"abstract":"Performance modeling of extreme-scale applications on accurate representations of potential architectures is critical for designing next generation supercomputing systems because it is impractical to construct prototype systems at scale with new network hardware in order to explore designs and policies. However, these simulations often rely on static application traces that can be difficult to work with because of their size and lack of flexibility to extend or scale up without rerunning the original application. To address this problem, we have created a new technique for generating scalable, flexible workloads from real applications, we have implemented a prototype, called Durango, that combines a proven analytical performance modeling language, Aspen, with the massively parallel HPC network modeling capabilities of the CODES framework. Our models are compact, parameterized and representative of real applications with computation events. They are not resource intensive to create and are portable across simulator environments. We demonstrate the utility of Durango by simulating the LULESH application in the CODES simulation environment on several topologies and show that Durango is practical to use for simulation without loss of fidelity, as quantified by simulation metrics. During our validation of Durango's generated communication model of LULESH, we found that the original LULESH miniapp code had a latent bug where the MPI_Waitall operation was used incorrectly. This finding underscores the potential need for a tool such as Durango, beyond its benefits for flexible workload generation and modeling. Additionally, we demonstrate the efficacy of Durango's direct integration approach, which links Aspen into CODES as part of the running network simulation model. Here, Aspen generates the application-level computation timing events, which in turn drive the start of a network communication phase. Results show that Durango's performance scales well when executing both torus and dragonfly network models on up to 4K Blue Gene/Q nodes using 32K MPI ranks, Durango also avoids the overheads and complexities associated with extreme-scale trace files.","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116829940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yadong Xu, Wentong Cai, D. Eckhoff, Suraj Nair, A. Knoll
A common approach of parallelising an agent-based road traffic simulation is to partition the road network into sub-regions and assign computations for each subregion to a logical process (LP). Inter-process communication for synchronisation between the LPs is one of the major factors that affect the performance of parallel agent-based road traffic simulation in a distributed memory environment. Synchronisation overhead, i.e., the number of messages and the communication data volume exchanged between LPs, is heavily dependent on the employed road network partitioning algorithm. In this paper, we propose Neighbour-Restricting Graph-Growing (NRGG), a partitioning algorithm which tries to reduce the required communication between LPs by minimising the number of neighbouring partitions. Based on a road traffic simulation of the city of Singapore, we show that our method not only outperforms graph partitioning methods such as METIS and Buffoon, for the synchronisation protocol used, but also is more resilient than stripe spatial partitioning when partitions are cut more ?nely.
{"title":"A Graph Partitioning Algorithm for Parallel Agent-Based Road Traffic Simulation","authors":"Yadong Xu, Wentong Cai, D. Eckhoff, Suraj Nair, A. Knoll","doi":"10.1145/3064911.3064914","DOIUrl":"https://doi.org/10.1145/3064911.3064914","url":null,"abstract":"A common approach of parallelising an agent-based road traffic simulation is to partition the road network into sub-regions and assign computations for each subregion to a logical process (LP). Inter-process communication for synchronisation between the LPs is one of the major factors that affect the performance of parallel agent-based road traffic simulation in a distributed memory environment. Synchronisation overhead, i.e., the number of messages and the communication data volume exchanged between LPs, is heavily dependent on the employed road network partitioning algorithm. In this paper, we propose Neighbour-Restricting Graph-Growing (NRGG), a partitioning algorithm which tries to reduce the required communication between LPs by minimising the number of neighbouring partitions. Based on a road traffic simulation of the city of Singapore, we show that our method not only outperforms graph partitioning methods such as METIS and Buffoon, for the synchronisation protocol used, but also is more resilient than stripe spatial partitioning when partitions are cut more ?nely.","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126088180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote II","authors":"Kevin Jin","doi":"10.1145/3254050","DOIUrl":"https://doi.org/10.1145/3254050","url":null,"abstract":"","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133644985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discrete event simulation is an accepted instrument for investigating the dynamic behavior of complex systems and evaluating processes. Usually simulation experts conduct simulation experiments for a predetermined system specification by manually varying parameters through educated assumptions and according to a prior defined goal. As an alternative, data farming and knowledge discovery in simulation data are ongoing and popular methods in order to uncover unknown relationships and effects in the model to gain useful information about the underlying system. Those methods usually demand broad scale and data intensive experimental design, so computing time can quickly become large. As a solution to that, we extend an existing concept of knowledge discovery in simulation data with an online stream mining component to get data mining results even while experiments are still running. For this purpose, we introduce a method for using decision tree classification in combination with clustering algorithms for analyzing simulation output data that considers the flow of experiments as a data stream. A prototypical implementation proves the basic applicability of the concept and yields large possibilities for future research.
{"title":"Online Analysis of Simulation Data with Stream-based Data Mining","authors":"N. Feldkamp, S. Bergmann, S. Strassburger","doi":"10.1145/3064911.3064915","DOIUrl":"https://doi.org/10.1145/3064911.3064915","url":null,"abstract":"Discrete event simulation is an accepted instrument for investigating the dynamic behavior of complex systems and evaluating processes. Usually simulation experts conduct simulation experiments for a predetermined system specification by manually varying parameters through educated assumptions and according to a prior defined goal. As an alternative, data farming and knowledge discovery in simulation data are ongoing and popular methods in order to uncover unknown relationships and effects in the model to gain useful information about the underlying system. Those methods usually demand broad scale and data intensive experimental design, so computing time can quickly become large. As a solution to that, we extend an existing concept of knowledge discovery in simulation data with an online stream mining component to get data mining results even while experiments are still running. For this purpose, we introduce a method for using decision tree classification in combination with clustering algorithms for analyzing simulation output data that considers the flow of experiments as a data stream. A prototypical implementation proves the basic applicability of the concept and yields large possibilities for future research.","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123416273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Serror, J. C. Kirchhof, Mirko Stoffers, Klaus Wehrle, J. Gross
Exhaustive testing of wireless communication protocols on prototypical hardware is costly and time-consuming. An alternative approach is network simulation, which, however, often strongly abstracts from the actual hardware. Especially in the wireless domain, such abstractions often lead to inaccurate simulation results. Therefore, we propose a code-transparent discrete event simulator that enables a direct simulation of existing code for wireless prototypes. With a focus on lower layers of the communication stack, we enable a parametrization of the simulation timings based on real-world measurements to increase the simulation accuracy. Our evaluation shows that we achieve close results for throughput (deviation below 3% for UDP and latency (corrected deviation about 13% compared to real-world setups, while providing the benefits of code-transparent simulation, i.e., to flexibly simulate large topologies with existing prototype code. Moreover, we demonstrate that our approach finds implementation defects in existing hardware prototype software, which are otherwise difficult to track down in real deployments.
{"title":"Code-transparent Discrete Event Simulation for Time-accurate Wireless Prototyping","authors":"Martin Serror, J. C. Kirchhof, Mirko Stoffers, Klaus Wehrle, J. Gross","doi":"10.1145/3064911.3064913","DOIUrl":"https://doi.org/10.1145/3064911.3064913","url":null,"abstract":"Exhaustive testing of wireless communication protocols on prototypical hardware is costly and time-consuming. An alternative approach is network simulation, which, however, often strongly abstracts from the actual hardware. Especially in the wireless domain, such abstractions often lead to inaccurate simulation results. Therefore, we propose a code-transparent discrete event simulator that enables a direct simulation of existing code for wireless prototypes. With a focus on lower layers of the communication stack, we enable a parametrization of the simulation timings based on real-world measurements to increase the simulation accuracy. Our evaluation shows that we achieve close results for throughput (deviation below 3% for UDP and latency (corrected deviation about 13% compared to real-world setups, while providing the benefits of code-transparent simulation, i.e., to flexibly simulate large topologies with existing prototype code. Moreover, we demonstrate that our approach finds implementation defects in existing hardware prototype software, which are otherwise difficult to track down in real deployments.","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130219342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network-based computer simulation models are powerful tools for analyzing and guiding policy formation related to the actual systems being modeled. However, the inherent data and computationally intensive nature of this model class gives rise to fundamental challenges when it comes to executing typical experimental designs. In particular this applies to model validation. Manual management of the complex simulation work-flows along with the associated data will often require a broad combination of skills and expertise. Examples of skills include domain expertise, mathematical modeling, programming, high-performance computing, statistical designs, data management as well as the tracking all assets and instances involved. This is a complex and error-prone process for the best of practices, and even small slips may compromise model validation and reduce human productivity in significant ways. In this paper, we present a novel framework that addresses the challenges of model validation just mentioned. The components of our framework form an ecosystem consisting of (i) model unification through a standardized model configuration format, (ii) simulation data management, (iii) support for experimental designs, and (iv) methods for uncertainty quantification, and sensitivity analysis, all ultimately supporting the process of model validation. (Note that our view of validation is much more comprehensive than simply ensuring that the computational model can reproduce instance of historical data.) This is an extensible design where domain experts from e.g. experimental design can contribute to the collection of available algorithms and methods. Additionally, our solution directly supports reproducible computational experiments and analysis, which in turn facilitates independent model verification and validation. Finally, to showcase our design concept, we provide a sensitivity analysis for examining the consequences of different intervention strategies for an influenza pandemic.
{"title":"A Framework for Validation of Network-based Simulation Models: an Application to Modeling Interventions of Pandemics","authors":"Sichao Wu, H. Mortveit, Sandeep Gupta","doi":"10.1145/3064911.3064922","DOIUrl":"https://doi.org/10.1145/3064911.3064922","url":null,"abstract":"Network-based computer simulation models are powerful tools for analyzing and guiding policy formation related to the actual systems being modeled. However, the inherent data and computationally intensive nature of this model class gives rise to fundamental challenges when it comes to executing typical experimental designs. In particular this applies to model validation. Manual management of the complex simulation work-flows along with the associated data will often require a broad combination of skills and expertise. Examples of skills include domain expertise, mathematical modeling, programming, high-performance computing, statistical designs, data management as well as the tracking all assets and instances involved. This is a complex and error-prone process for the best of practices, and even small slips may compromise model validation and reduce human productivity in significant ways. In this paper, we present a novel framework that addresses the challenges of model validation just mentioned. The components of our framework form an ecosystem consisting of (i) model unification through a standardized model configuration format, (ii) simulation data management, (iii) support for experimental designs, and (iv) methods for uncertainty quantification, and sensitivity analysis, all ultimately supporting the process of model validation. (Note that our view of validation is much more comprehensive than simply ensuring that the computational model can reproduce instance of historical data.) This is an extensible design where domain experts from e.g. experimental design can contribute to the collection of available algorithms and methods. Additionally, our solution directly supports reproducible computational experiments and analysis, which in turn facilitates independent model verification and validation. Finally, to showcase our design concept, we provide a sensitivity analysis for examining the consequences of different intervention strategies for an influenza pandemic.","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126111221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Paper Session 1 Parallel Simulation I","authors":"R. Fujimoto","doi":"10.1145/3254049","DOIUrl":"https://doi.org/10.1145/3254049","url":null,"abstract":"","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121511866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Paper Session 7 Simulation Application II","authors":"Jiaqi Yan","doi":"10.1145/3254056","DOIUrl":"https://doi.org/10.1145/3254056","url":null,"abstract":"","PeriodicalId":341026,"journal":{"name":"Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121892324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}