Ketan Bahulkar, Jingjing Wang, N. Abu-Ghazaleh, D. Ponomarev
Partitioning plays an important role in PDES performance due to the high communication cost in parallel platforms and the fine-granularity of most simulation models. Traditionally, models are partitioned by deriving the static communication graph of objects and applying graph partitioning to reduce the mincut while load balancing the number of objects. However, many, if not all, models exhibit great diversity in their dynamic behavior: objects communicate with each other with diverse frequencies that are commonly power-law distributed. Similar diversity exists in the activity of objects and the processing requirements of events. In this paper, we argue that partitioning based on static graphs ignores these effects, leading to poor partitioning. We explore how partitioning based on dynamic information should be approached and explore policies that focus on communication cost, load balancing and both. We show that on multicore clusters, dynamic partitioning achieves up to 4x better performance than static partitioning. On the AMD magnycours, where the communication latency is low, dynamic partitioning results in a 2x performance improvement over static partitioning for some of our models. Our future work considers how to derive the dynamic weights (in this study, we do that through profiling), and how to balance the importance of communication and computation in a way that is informed by the underlying architecture.
{"title":"Partitioning on Dynamic Behavior for Parallel Discrete Event Simulation","authors":"Ketan Bahulkar, Jingjing Wang, N. Abu-Ghazaleh, D. Ponomarev","doi":"10.1109/PADS.2012.32","DOIUrl":"https://doi.org/10.1109/PADS.2012.32","url":null,"abstract":"Partitioning plays an important role in PDES performance due to the high communication cost in parallel platforms and the fine-granularity of most simulation models. Traditionally, models are partitioned by deriving the static communication graph of objects and applying graph partitioning to reduce the mincut while load balancing the number of objects. However, many, if not all, models exhibit great diversity in their dynamic behavior: objects communicate with each other with diverse frequencies that are commonly power-law distributed. Similar diversity exists in the activity of objects and the processing requirements of events. In this paper, we argue that partitioning based on static graphs ignores these effects, leading to poor partitioning. We explore how partitioning based on dynamic information should be approached and explore policies that focus on communication cost, load balancing and both. We show that on multicore clusters, dynamic partitioning achieves up to 4x better performance than static partitioning. On the AMD magnycours, where the communication latency is low, dynamic partitioning results in a 2x performance improvement over static partitioning for some of our models. Our future work considers how to derive the dynamic weights (in this study, we do that through profiling), and how to balance the importance of communication and computation in a way that is informed by the underlying architecture.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116655492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.
{"title":"Multi-level Parallelism for Time- and Cost-Efficient Parallel Discrete Event Simulation on GPUs","authors":"G. Kunz, Daniel Schemmel, J. Gross, Klaus Wehrle","doi":"10.1109/PADS.2012.27","DOIUrl":"https://doi.org/10.1109/PADS.2012.27","url":null,"abstract":"Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"424 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123560066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Jin, Yuhao Zheng, Huaiyu Zhu, D. Nicol, Lenhard Winterrowd
A high fidelity testbed for large-scale system analysis requires emulation to represent the execution of critical software, and simulation to model an extensive ensemble of background computation and communication. We leverage prior work showing that large numbers of virtual environments may be emulated on a single host, and that the time stamped interactions between them can be mapped to virtual time, and we leverage existing work on simulation of large-scale communication networks. The present paper brings these concepts together, marrying the scale emulation framework OpenVZ (modified earlier to operate in virtual time) with a scalable network simulator S3F. Our algorithmic contributions lay in the design and management of virtual time as it transitions from emulation, to simulation, and back. In particular, inescapable uncertainties in emulation behavior force us to explicitly set and reset timestamps so as to avoid either emulator or simulator having to deal with a packet arriving in its logical past. We provide analytic bounds and empirical evidence that the error introduced in resetting timestamps is small. Finally, we present a case-study using this capability, of a cyber-attack with the smart power grid communication infrastructure.
{"title":"Virtual Time Integration of Emulation and Parallel Simulation","authors":"Dong Jin, Yuhao Zheng, Huaiyu Zhu, D. Nicol, Lenhard Winterrowd","doi":"10.1109/PADS.2012.49","DOIUrl":"https://doi.org/10.1109/PADS.2012.49","url":null,"abstract":"A high fidelity testbed for large-scale system analysis requires emulation to represent the execution of critical software, and simulation to model an extensive ensemble of background computation and communication. We leverage prior work showing that large numbers of virtual environments may be emulated on a single host, and that the time stamped interactions between them can be mapped to virtual time, and we leverage existing work on simulation of large-scale communication networks. The present paper brings these concepts together, marrying the scale emulation framework OpenVZ (modified earlier to operate in virtual time) with a scalable network simulator S3F. Our algorithmic contributions lay in the design and management of virtual time as it transitions from emulation, to simulation, and back. In particular, inescapable uncertainties in emulation behavior force us to explicitly set and reset timestamps so as to avoid either emulator or simulator having to deal with a packet arriving in its logical past. We provide analytic bounds and empirical evidence that the error introduced in resetting timestamps is small. Finally, we present a case-study using this capability, of a cyber-attack with the smart power grid communication infrastructure.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114448333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the model splitting method for large-scale interactive network simulation, which addresses the separation of concerns between network researchers, who focus on developing complex network models and conducting large-scale network experiments, and simulator developers, who are concerned with developing efficient simulation engines to achieve the best performance on parallel platforms. Modeling splitting divides the system into an interactive model to support user interaction, and an execution model to facilitate parallel processing. We describe techniques to maintain consistency and real-time synchronization between the two models. We also provide solutions to reduce the memory complexity of large network models and to ensure data persistency and access efficiency for out-of-core processing.
{"title":"Realizing Large-Scale Interactive Network Simulation via Model Splitting","authors":"N. Vorst, Jason Liu","doi":"10.1109/PADS.2012.35","DOIUrl":"https://doi.org/10.1109/PADS.2012.35","url":null,"abstract":"This paper presents the model splitting method for large-scale interactive network simulation, which addresses the separation of concerns between network researchers, who focus on developing complex network models and conducting large-scale network experiments, and simulator developers, who are concerned with developing efficient simulation engines to achieve the best performance on parallel platforms. Modeling splitting divides the system into an interactive model to support user interaction, and an execution model to facilitate parallel processing. We describe techniques to maintain consistency and real-time synchronization between the two models. We also provide solutions to reduce the memory complexity of large network models and to ensure data persistency and access efficiency for out-of-core processing.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129435717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Jagtap, Ketan Bahulkar, D. Ponomarev, N. Abu-Ghazaleh
The emergence of many core architectures with shifting balance between computation and communication overhead can have a tremendous impact on performance and scalability of fine-grained parallel applications such as PDES. It may also be necessary to rethink the design philosophy of key PDES subsystems, that were traditionally focussed on hiding long communication delays. In this paper, we perform extensive evaluation of PDES on Tile64Pro - a new 64-core chip from Tilera. For our studies, we use the recently developed multithreaded version of the popular ROSS simulator and show that the performance of this simulator (with many optimizations proposed) scales by a factor of 27X when it is executed on 56 cores of the Tilera chip for Phold benchmark with 20% remote communication. We also evaluate the impact of performance optimizations that we propose on both conservative and optimistic versions of the simulator and also analyze the sensitivity to various simulation parameters. Finally, we explore the issues of object placement and model partitioning on Tilera architecture.
{"title":"Characterizing and Understanding PDES Behavior on Tilera Architecture","authors":"Deepak Jagtap, Ketan Bahulkar, D. Ponomarev, N. Abu-Ghazaleh","doi":"10.1109/PADS.2012.10","DOIUrl":"https://doi.org/10.1109/PADS.2012.10","url":null,"abstract":"The emergence of many core architectures with shifting balance between computation and communication overhead can have a tremendous impact on performance and scalability of fine-grained parallel applications such as PDES. It may also be necessary to rethink the design philosophy of key PDES subsystems, that were traditionally focussed on hiding long communication delays. In this paper, we perform extensive evaluation of PDES on Tile64Pro - a new 64-core chip from Tilera. For our studies, we use the recently developed multithreaded version of the popular ROSS simulator and show that the performance of this simulator (with many optimizations proposed) scales by a factor of 27X when it is executed on 56 cores of the Tilera chip for Phold benchmark with 20% remote communication. We also evaluate the impact of performance optimizations that we propose on both conservative and optimistic versions of the simulator and also analyze the sensitivity to various simulation parameters. Finally, we explore the issues of object placement and model partitioning on Tilera architecture.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115568415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyber-physical system (CPS) is a system featuring a tight combination and coordination between the system's computational and physical resources. As a CPS representative, the Weather Monitoring and Train Traffic Control Simulation System (WMT2CS2) includes two subsystems: the wireless sensor network front end and the train traffic control simulation subsystem. The sensing front end collects the real-time data of weathers(speeds and directions of winds and rainfalls, etc.), and connects to the simulation subsystem. The purpose of WMT2CS2 is to study the impact of weather on the train traffic control and envisions to enhance the safety of high-speed rail (HSR) system. However, the simulation system design faces new challenges such as accurate and fast time synchronization, fast data/command dissemination, and so on. In this paper, we propose an accurate and low-latency time synchronization protocol based on constructive interference (CI) to apply in the sensing front end of the hybrid simulation systems. As a recently discovered physical layer phenomenon, CI allows multiple nodes transmit and forward an identical packet simultaneously. By leveraging CI, the proposed Radio-Driven Time Synchronization protocol (RDTS) can realize microsecond time synchronization accuracy and milliseconds latency. Moreover, RDTS can directly utilize the time-stamps from the sink node instead of intermediate nodes, which avoids the error caused by the unstable clock of intermediate nodes.
{"title":"A Radio-Driven Time Synchronization Protocol in Hybrid Simulation Systems","authors":"Zhiyu Huang","doi":"10.1109/PADS.2012.5","DOIUrl":"https://doi.org/10.1109/PADS.2012.5","url":null,"abstract":"Cyber-physical system (CPS) is a system featuring a tight combination and coordination between the system's computational and physical resources. As a CPS representative, the Weather Monitoring and Train Traffic Control Simulation System (WMT2CS2) includes two subsystems: the wireless sensor network front end and the train traffic control simulation subsystem. The sensing front end collects the real-time data of weathers(speeds and directions of winds and rainfalls, etc.), and connects to the simulation subsystem. The purpose of WMT2CS2 is to study the impact of weather on the train traffic control and envisions to enhance the safety of high-speed rail (HSR) system. However, the simulation system design faces new challenges such as accurate and fast time synchronization, fast data/command dissemination, and so on. In this paper, we propose an accurate and low-latency time synchronization protocol based on constructive interference (CI) to apply in the sensing front end of the hybrid simulation systems. As a recently discovered physical layer phenomenon, CI allows multiple nodes transmit and forward an identical packet simultaneously. By leveraging CI, the proposed Radio-Driven Time Synchronization protocol (RDTS) can realize microsecond time synchronization accuracy and milliseconds latency. Moreover, RDTS can directly utilize the time-stamps from the sink node instead of intermediate nodes, which avoids the error caused by the unstable clock of intermediate nodes.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126424713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of-order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest time-stamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.
{"title":"Dynamically Adjusting Core Frequencies to Accelerate Time Warp Simulations in Many-Core Processors","authors":"Ryan Child, P. Wilsey","doi":"10.1109/PADS.2012.15","DOIUrl":"https://doi.org/10.1109/PADS.2012.15","url":null,"abstract":"Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of-order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest time-stamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117047253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We discuss our approach to federating dissimilar discrete event simulations, leveraging the strengths and design goals of both, to produce a packet-level detailed network model federated with a component-level detailed input-queuing router model. All existing network simulation tools that we are aware of incorporate a very simplistic model for the flow of packets through a router. The simplistic model simply responds to a packet receipt event by performing a route look-up and adding the packet to the output queue of the next-hop output interface. This is often simulated to take place in zero time, or with rudimentary probabilistic models of delay within a router. However, modern high-end routers are designed using a complex input-queuing methodology and a sophisticated scheduling approach to move packets through a crossbar switch from the input queue to the output queue. We used the popular ns -- 3 network simulator to create realistic packet-level models of network load, and the Manifold computer architecture simulator to create a realistic model of data movement through an input-queued router. We federated the two by means of two alternative approaches: First, two POSIX threads run within a single simulation process and utilize the shared memory for both time synchronization and packet exchange. Second, we used the well-known MPI message passing library for the federation. Our results show that the detailed router models can in fact produce somewhat different packet delay and loss characteristics than the simplistic router models at the expense of considerable computational complexity.
{"title":"Hybrid Simulation of Packet-Level Networks and Functional-Level Routers","authors":"Mirko Stoffers, G. Riley","doi":"10.1109/PADS.2012.22","DOIUrl":"https://doi.org/10.1109/PADS.2012.22","url":null,"abstract":"We discuss our approach to federating dissimilar discrete event simulations, leveraging the strengths and design goals of both, to produce a packet-level detailed network model federated with a component-level detailed input-queuing router model. All existing network simulation tools that we are aware of incorporate a very simplistic model for the flow of packets through a router. The simplistic model simply responds to a packet receipt event by performing a route look-up and adding the packet to the output queue of the next-hop output interface. This is often simulated to take place in zero time, or with rudimentary probabilistic models of delay within a router. However, modern high-end routers are designed using a complex input-queuing methodology and a sophisticated scheduling approach to move packets through a crossbar switch from the input queue to the output queue. We used the popular ns -- 3 network simulator to create realistic packet-level models of network load, and the Manifold computer architecture simulator to create a realistic model of data movement through an input-queued router. We federated the two by means of two alternative approaches: First, two POSIX threads run within a single simulation process and utilize the shared memory for both time synchronization and packet exchange. Second, we used the well-known MPI message passing library for the federation. Our results show that the detailed router models can in fact produce somewhat different packet delay and loss characteristics than the simplistic router models at the expense of considerable computational complexity.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117129617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debugging is critically important for diagnosing bugs of programs. In optimistic Parallel Discrete Event Simulation(PDES), a bug is probably not to be reproduced for the different orders of event processing in different simulation runs, so locating bugs is of great challenge in debugging PDES programs. To solve this problem, we first propose a bug reproducing method based on checkpoint/restart mechanism, which avoids starting the program from scratch when an error emerges. Moreover, our method can change the checkpoint interval dynamically to reduce the overhead of states saving. Then, based on bug reproduction we propose a bug locating method, which aims at searching for these events that cause the bugs likely by comparing the event processing sequences between one passing test case and the failing test case. By doing this, we can focus on the events directly related to the bugs, which will reduce the time of locating a bug.
{"title":"A Bug Locating Method for the Debugging of Parallel Discrete Event Simulation","authors":"Feng Zhu, Yiping Yao","doi":"10.1109/PADS.2012.1","DOIUrl":"https://doi.org/10.1109/PADS.2012.1","url":null,"abstract":"Debugging is critically important for diagnosing bugs of programs. In optimistic Parallel Discrete Event Simulation(PDES), a bug is probably not to be reproduced for the different orders of event processing in different simulation runs, so locating bugs is of great challenge in debugging PDES programs. To solve this problem, we first propose a bug reproducing method based on checkpoint/restart mechanism, which avoids starting the program from scratch when an error emerges. Moreover, our method can change the checkpoint interval dynamically to reduce the overhead of states saving. Then, based on bug reproduction we propose a bug locating method, which aims at searching for these events that cause the bugs likely by comparing the event processing sequences between one passing test case and the failing test case. By doing this, we can focus on the events directly related to the bugs, which will reduce the time of locating a bug.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115364447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to diverse network latencies, participants in a Distributed Virtual Environment (DVE) may observe different inconsistency levels of the simulated virtual world, which can seriously affect fair competition among them. In this paper, we investigate how to disseminate Dead Reckoning (DR)-based updates with the objectives of achieving fairness among participants and reducing inconsistency as much as possible. We first propose an optimized bandwidth allocation scheme for sending updates to overcome the drawbacks of uniform bandwidth allocation and the local-lag technique. Then, we integrate bandwidth allocation with an indirect relay method and develop algorithms to select relay routes for minimizing inconsistency under various bandwidth allocation schemes. Our proposed scheme and algorithms are evaluated using traces collected from a real car racing game as well as the real Internet latency data. The experimental results show that the proposed optimized bandwidth allocation scheme significantly reduces inconsistency while maintaining fairness among participants and that integrating the optimized scheme with our proposed relay setup algorithm further improves consistency.
{"title":"Fair and Efficient Dead Reckoning-Based Update Dissemination for Distributed Virtual Environments","authors":"Zengxiang Li, Xueyan Tang, Wentong Cai, S. Turner","doi":"10.1109/PADS.2012.18","DOIUrl":"https://doi.org/10.1109/PADS.2012.18","url":null,"abstract":"Due to diverse network latencies, participants in a Distributed Virtual Environment (DVE) may observe different inconsistency levels of the simulated virtual world, which can seriously affect fair competition among them. In this paper, we investigate how to disseminate Dead Reckoning (DR)-based updates with the objectives of achieving fairness among participants and reducing inconsistency as much as possible. We first propose an optimized bandwidth allocation scheme for sending updates to overcome the drawbacks of uniform bandwidth allocation and the local-lag technique. Then, we integrate bandwidth allocation with an indirect relay method and develop algorithms to select relay routes for minimizing inconsistency under various bandwidth allocation schemes. Our proposed scheme and algorithms are evaluated using traces collected from a real car racing game as well as the real Internet latency data. The experimental results show that the proposed optimized bandwidth allocation scheme significantly reduces inconsistency while maintaining fairness among participants and that integrating the optimized scheme with our proposed relay setup algorithm further improves consistency.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}