2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems最新文献_第3页

Makespan-Optimal Cache Partitioning makespan—最优Cache分区

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.28

Pan Lai, Rui Fan

In current multicore systems, cache memory is shared between multiple concurrent threads. Allocating the proper amount of cache to each thread is crucial to achieving high performance. Cache management in many existing systems is based on the least recently used replacement policy, which can lead to adverse contention between threads for shared cache space. Cache partitioning is a technique that reserves a certain amount of cache for each thread, and has been shown to work well in practice. We introduce the problem of determining the optimal cache partitioning to minimize the make span for completing a set of tasks. We analyze the problem using a model that generalizes a widely used empirical model for cache miss rates. Our first contribution is to give a mathematical characterization of the properties satisfied by an optimal partitioning. Second, we present an algorithm that finds a 1 +epsilon approximation to the optimal partitioning in O(n log frac{n}{epsilon}logfrac{n}{epsilon p}) time, where n is the number of tasks and p is a value that depends on the optimal solution. We compare our algorithm with several partitioning schemes used in practice or proposed in the literature. Simulations show that our algorithm achieves between 22-59% better make span compared to these algorithms.

在当前的多核系统中，缓存内存在多个并发线程之间共享。为每个线程分配适当数量的缓存对于实现高性能至关重要。许多现有系统中的缓存管理基于最近最少使用的替换策略，这可能导致线程之间对共享缓存空间的不利争用。缓存分区是一种为每个线程保留一定数量的缓存的技术，在实践中表现良好。我们介绍了确定最佳缓存分区以最小化完成一组任务的make跨度的问题。我们使用一个模型来分析这个问题，这个模型推广了一个广泛使用的缓存缺失率的经验模型。我们的第一个贡献是给出最优划分所满足的性质的数学表征。其次，我们提出了一种算法，该算法在O(n log frac{n}{epsilon} log frac{n}{epsilon p})时间内找到最优分区的1 + epsilon近似值，其中n是任务的数量，p是依赖于最优解的值。我们将我们的算法与实践中使用或在文献中提出的几种划分方案进行了比较。仿真结果表明，我们的算法达到了22-59之间% better make span compared to these algorithms.

{"title":"Makespan-Optimal Cache Partitioning","authors":"Pan Lai, Rui Fan","doi":"10.1109/MASCOTS.2013.28","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.28","url":null,"abstract":"In current multicore systems, cache memory is shared between multiple concurrent threads. Allocating the proper amount of cache to each thread is crucial to achieving high performance. Cache management in many existing systems is based on the least recently used replacement policy, which can lead to adverse contention between threads for shared cache space. Cache partitioning is a technique that reserves a certain amount of cache for each thread, and has been shown to work well in practice. We introduce the problem of determining the optimal cache partitioning to minimize the make span for completing a set of tasks. We analyze the problem using a model that generalizes a widely used empirical model for cache miss rates. Our first contribution is to give a mathematical characterization of the properties satisfied by an optimal partitioning. Second, we present an algorithm that finds a 1 +epsilon approximation to the optimal partitioning in O(n log frac{n}{epsilon}logfrac{n}{epsilon p}) time, where n is the number of tasks and p is a value that depends on the optimal solution. We compare our algorithm with several partitioning schemes used in practice or proposed in the literature. Simulations show that our algorithm achieves between 22-59% better make span compared to these algorithms.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132900514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Virtual Machine Scheduling for Parallel Soft Real-Time Applications 并行软实时应用的虚拟机调度

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.74

Like Zhou, Song Wu, Huahua Sun, Hai Jin, Xuanhua Shi

With the prevalence of multicore processors in computer systems, many soft real-time applications, such as media-based ones, use parallel programming models to utilize hardware resources better and possibly shorten response time. Meanwhile, virtualization technology is widely used in cloud data centers. More and more cloud services including such parallel soft real-time applications are running in virtualized environment. However, current hyper visors do not provide adequate support for them because of soft real-time constraints and synchronization problems, which result in frequent deadline misses and serious performance degradation. CPU schedulers in underlying hyper visors are central to these issues. In this paper, we identify and analyze CPU scheduling problems in hyper visors, and propose a novel scheduling algorithm considering both soft real-time constraints and synchronization problems. In our proposed method, real-time priority is introduced to accelerate event processing of parallel soft real-time applications, and dynamic time slice is used to schedule virtual CPUs. Besides, all runnable virtual CPUs of virtual machines running parallel soft real-time applications are scheduled simultaneously to address synchronization problems. We implement a parallel soft real-time scheduler, named Poris, based on Xen. Our evaluation shows Poris can significantly improve the performance of parallel soft real-time applications. For example, compared to the Credit scheduler, Poris improves the performance of media player by up to a factor of 1.35, and shortens the execution time of PARSEC benchmark by up to 44.12%.

随着多核处理器在计算机系统中的普及，许多软实时应用程序(如基于媒体的应用程序)使用并行编程模型来更好地利用硬件资源并可能缩短响应时间。同时，虚拟化技术在云数据中心得到了广泛的应用。包括并行软实时应用在内的越来越多的云服务正在虚拟化环境中运行。然而，由于软实时约束和同步问题，当前的虚拟监控程序不能为它们提供足够的支持，这导致经常错过截止日期和严重的性能下降。底层虚拟监控程序中的CPU调度器是这些问题的核心。本文对虚拟监控系统中的CPU调度问题进行了识别和分析，提出了一种考虑软实时约束和同步问题的新型调度算法。在该方法中，引入实时优先级来加速并行软实时应用程序的事件处理，并使用动态时间片来调度虚拟cpu。同时对运行并行软实时应用程序的虚拟机的所有可运行虚拟cpu进行同步调度，以解决同步问题。我们实现了一个基于Xen的并行软实时调度程序Poris。我们的评估表明，Poris可以显著提高并行软实时应用的性能。例如，与Credit调度器相比，Poris将媒体播放器的性能提高了1.35倍，并将PARSEC基准测试的执行时间缩短了44.12%。

{"title":"Virtual Machine Scheduling for Parallel Soft Real-Time Applications","authors":"Like Zhou, Song Wu, Huahua Sun, Hai Jin, Xuanhua Shi","doi":"10.1109/MASCOTS.2013.74","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.74","url":null,"abstract":"With the prevalence of multicore processors in computer systems, many soft real-time applications, such as media-based ones, use parallel programming models to utilize hardware resources better and possibly shorten response time. Meanwhile, virtualization technology is widely used in cloud data centers. More and more cloud services including such parallel soft real-time applications are running in virtualized environment. However, current hyper visors do not provide adequate support for them because of soft real-time constraints and synchronization problems, which result in frequent deadline misses and serious performance degradation. CPU schedulers in underlying hyper visors are central to these issues. In this paper, we identify and analyze CPU scheduling problems in hyper visors, and propose a novel scheduling algorithm considering both soft real-time constraints and synchronization problems. In our proposed method, real-time priority is introduced to accelerate event processing of parallel soft real-time applications, and dynamic time slice is used to schedule virtual CPUs. Besides, all runnable virtual CPUs of virtual machines running parallel soft real-time applications are scheduled simultaneously to address synchronization problems. We implement a parallel soft real-time scheduler, named Poris, based on Xen. Our evaluation shows Poris can significantly improve the performance of parallel soft real-time applications. For example, compared to the Credit scheduler, Poris improves the performance of media player by up to a factor of 1.35, and shortens the execution time of PARSEC benchmark by up to 44.12%.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128642693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Camera Shooting Location Recommendations for Landmarks in Geo-space 地理空间中地标的相机拍摄位置建议

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.25

Y. Zhang, Roger Zimmermann

Taking photos of landmarks is a favorite and popular way for travellers to keep memories of places they have visited. Community-contributed photo collections, such as on Flickr, provide us an opportunity to gain a more in-depth understanding of a landmark's visual appeal. While much current research is focusing on recommending which representative photos should be selected from such pervasive photo sources, our work aims to find out where a visitor can capture his or her own, beautiful and personal photo of a queried landmark. We believe that this aspect of helping users to take memorable photos has not been well studied. We propose a method to recommend a list of shooting locations that have the utmost potential to capture appealing photos for a landmark of interest. A Gaussian Mixture Model based clustering approach is applied to the camera locations from an existing photo repository, generating a set of regions each of which covers an area with sufficient semantics, e.g., a route section. The scores and ranks among these camera locations are evaluated through multiple criteria, including their potential for better visual aesthetics, overall social attractiveness, popularity, etc. Additionally, we investigate the temporal characteristics of these locations by considering the spatio-temporal space. A number of different recommendations are generated from these results, such as the best camera positions at different times throughout a single day, or the best visiting time in the same spatial area. Subjective evaluation studies have been conducted, which indicate that our work can generate promising results.

给地标拍照是旅行者最喜欢和最流行的一种方式，可以让他们记住他们去过的地方。社区提供的图片集，如Flickr，为我们提供了一个更深入地了解地标视觉吸引力的机会。虽然目前的许多研究都集中在推荐从这些无处不在的照片来源中选择哪些有代表性的照片，但我们的工作旨在找出游客可以在哪里捕捉到他或她自己的、美丽的、个人的地标照片。我们认为，这方面帮助用户拍摄令人难忘的照片还没有得到很好的研究。我们提出了一种方法来推荐一个拍摄地点的列表，这些地点最有可能为一个感兴趣的地标拍摄吸引人的照片。基于高斯混合模型的聚类方法应用于现有照片存储库中的相机位置，生成一组区域，每个区域覆盖一个具有足够语义的区域，例如路线部分。这些拍摄地点的分数和排名是通过多种标准来评估的，包括它们更好的视觉美学潜力、整体社会吸引力、受欢迎程度等。此外，我们还从时空空间的角度考察了这些地点的时间特征。从这些结果中产生了许多不同的建议，例如一天中不同时间的最佳相机位置，或者同一空间区域的最佳访问时间。进行了主观评价研究，表明我们的工作可以产生有希望的结果。

{"title":"Camera Shooting Location Recommendations for Landmarks in Geo-space","authors":"Y. Zhang, Roger Zimmermann","doi":"10.1109/MASCOTS.2013.25","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.25","url":null,"abstract":"Taking photos of landmarks is a favorite and popular way for travellers to keep memories of places they have visited. Community-contributed photo collections, such as on Flickr, provide us an opportunity to gain a more in-depth understanding of a landmark's visual appeal. While much current research is focusing on recommending which representative photos should be selected from such pervasive photo sources, our work aims to find out where a visitor can capture his or her own, beautiful and personal photo of a queried landmark. We believe that this aspect of helping users to take memorable photos has not been well studied. We propose a method to recommend a list of shooting locations that have the utmost potential to capture appealing photos for a landmark of interest. A Gaussian Mixture Model based clustering approach is applied to the camera locations from an existing photo repository, generating a set of regions each of which covers an area with sufficient semantics, e.g., a route section. The scores and ranks among these camera locations are evaluated through multiple criteria, including their potential for better visual aesthetics, overall social attractiveness, popularity, etc. Additionally, we investigate the temporal characteristics of these locations by considering the spatio-temporal space. A number of different recommendations are generated from these results, such as the best camera positions at different times throughout a single day, or the best visiting time in the same spatial area. Subjective evaluation studies have been conducted, which indicate that our work can generate promising results.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129062538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Single-Snapshot File System Analysis 单快照文件系统分析

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.47

Avani Wildani, I. Adams, E. L. Miller

Metadata snapshots are a common method for gaining insight into file systems due to their small size and relative ease of acquisition. Since they are static, most researchers have used them for relatively simple analyses such as file size distributions and age of files. We hypothesize that it is possible to gain much richer insights into file system and user behavior by clustering features in metadata snapshots and comparing the entropy within clusters to the entropy within natural partitions such as directory hierarchies. We discuss several different methods for gaining deeper insights into metadata snapshots, and show a small proof of concept using data from Los Alamos National Laboratories. In our initial work, we see evidence that it is possible to identify user locality information, traditionally the purview of dynamic traces, using a single static snapshot.

元数据快照是深入了解文件系统的常用方法，因为它们体积小且相对容易获取。由于它们是静态的，大多数研究人员使用它们进行相对简单的分析，例如文件大小分布和文件年龄。我们假设，通过对元数据快照中的特征进行集群，并将集群内的熵与自然分区(如目录层次结构)内的熵进行比较，可以更深入地了解文件系统和用户行为。我们讨论了几种不同的方法来获得对元数据快照的更深入的了解，并使用来自Los Alamos National Laboratories的数据展示了一个小的概念证明。在我们最初的工作中，我们看到有证据表明，使用单个静态快照可以识别用户位置信息(传统上是动态跟踪的范围)。

引用次数: 4

Architecting Efficient Peak Power Shaving Using Batteries in Data Centers 在数据中心中使用电池构建高效的峰值功率调节

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.32

Baris Aksanli, Eddie Pettis, T. Simunic

Peak power shaving allows data center providers to increase their computational capacity without exceeding a given power budget. Recent papers establish that machines may repurpose energy from uninterruptible power supplies (UPSs) to maintain power budgets during peak demand. Our paper demonstrates that existing studies overestimate cost savings by as much as 3.35x because they use simple battery reliability models, Boolean battery discharge and neglect the design and the cost of battery system communication in the state-of-the-art distributed UPS designs. We propose an architecture where batteries provide only a fraction of the data center power, exploiting nonlinear battery capacity properties to achieve longer battery life and longer peak shaving durations. This architecture demonstrates that a centralized UPS with partial discharge sufficiently reduces the cost so that double power conversion losses are not a limiting factor, thus contradicting the recent trends in warehouse-scale distributed UPS design. Our architecture increases battery lifetime by 78%, doubles the cost savings compared to the distributed design (corresponding to $75K/month savings for a 10MW data center) and significantly reduces the decision coordination latency by 4x relative to the state-of-the-art distributed designs.

峰值功率削减允许数据中心提供商在不超过给定功率预算的情况下增加其计算能力。最近的论文表明，机器可以重新利用不间断电源(ups)中的能量来维持高峰需求期间的电力预算。我们的论文表明，现有的研究高估了高达3.35倍的成本节约，因为他们使用了简单的电池可靠性模型，布尔电池放电，忽视了最先进的分布式UPS设计中电池系统通信的设计和成本。我们提出了一种架构，其中电池仅提供数据中心电力的一小部分，利用非线性电池容量特性来实现更长的电池寿命和更长的调峰持续时间。该架构表明，局部放电的集中式UPS可以充分降低成本，从而使双重功率转换损失不再是限制因素，从而与最近仓库规模分布式UPS设计的趋势相矛盾。我们的架构将电池寿命延长了78%，与分布式设计相比节省了一倍的成本(相当于10MW数据中心每月节省7.5万美元)，并且与最先进的分布式设计相比，显着减少了4倍的决策协调延迟。

{"title":"Architecting Efficient Peak Power Shaving Using Batteries in Data Centers","authors":"Baris Aksanli, Eddie Pettis, T. Simunic","doi":"10.1109/MASCOTS.2013.32","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.32","url":null,"abstract":"Peak power shaving allows data center providers to increase their computational capacity without exceeding a given power budget. Recent papers establish that machines may repurpose energy from uninterruptible power supplies (UPSs) to maintain power budgets during peak demand. Our paper demonstrates that existing studies overestimate cost savings by as much as 3.35x because they use simple battery reliability models, Boolean battery discharge and neglect the design and the cost of battery system communication in the state-of-the-art distributed UPS designs. We propose an architecture where batteries provide only a fraction of the data center power, exploiting nonlinear battery capacity properties to achieve longer battery life and longer peak shaving durations. This architecture demonstrates that a centralized UPS with partial discharge sufficiently reduces the cost so that double power conversion losses are not a limiting factor, thus contradicting the recent trends in warehouse-scale distributed UPS design. Our architecture increases battery lifetime by 78%, doubles the cost savings compared to the distributed design (corresponding to $75K/month savings for a 10MW data center) and significantly reduces the decision coordination latency by 4x relative to the state-of-the-art distributed designs.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116310774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Measuring and Analyzing Write Amplification Characteristics of Solid State Disks 固态磁盘写放大特性的测量与分析

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.29

Hui Sun, X. Qin, Fei Wu, C. Xie

Write amplification brings endurance challenges to NAND Flash-based solid state disks (SSDs) such as impacts upon their write endurance and lifetime. A large write amplification degrades program/erase cycles (P/Es) of NAND Flashes and reduces the endurance and performance of SSDs. The write amplification problem is mainly triggered by garbage collections, wear-leveling, metadata updates, and mapping table updates. Write amplification is defined as the ratio of data volume written by an SSD controller to data volume written by a host. In this paper, we propose a four-level model of write amplification for SSDs. The four levels considered in our model include the channel level, chip level, die level, and plane level. In light of this model, we design a method of analyzing write amplification of SSDs to trace SSD endurance and performance by incorporating the Ready/Busy (R/B) signal of NAND Flash. Our practical approach aims to measure the value of write amplification for an entire SSD rather than NAND Flashes. To validate our measurement technique and model, we implement a verified SSD (vSSD) system and perform a cross-comparison on a set of SSDs, which are stressed by micro-benchmarks and I/O traces. A new method for SSDs is adopted in our measurements to study the R/B signals of NAND Flashes in an SSD. Experimental results show that our model is accurate and the measurement technique is generally applicable to any SSDs.

写放大给基于NAND闪存的固态硬盘(ssd)带来了耐久性方面的挑战，如对其写耐久性和使用寿命的影响。大的写放大会降低NAND闪存的程序/擦除周期(P/Es)，降低ssd的耐用性和性能。写放大问题主要由垃圾收集、损耗均衡、元数据更新和映射表更新触发。写放大是指SSD控制器写入的数据量与主机写入的数据量之比。在本文中，我们提出了一个四层的固态硬盘写放大模型。在我们的模型中考虑的四个级别包括通道级别，芯片级别，芯片级别和面级别。针对该模型，我们设计了一种结合NAND闪存的Ready/Busy (R/B)信号分析SSD写放大的方法，以跟踪SSD的耐用性和性能。我们的实际方法旨在测量整个SSD的写入放大值，而不是NAND闪存。为了验证我们的测量技术和模型，我们实现了一个经过验证的SSD (vSSD)系统，并对一组SSD进行了交叉比较，这些SSD通过微基准测试和I/O跟踪进行了测试。在我们的测量中，采用了一种新的方法来研究固态硬盘中NAND闪存的R/B信号。实验结果表明，我们的模型是准确的，测量技术普遍适用于任何固态硬盘。

{"title":"Measuring and Analyzing Write Amplification Characteristics of Solid State Disks","authors":"Hui Sun, X. Qin, Fei Wu, C. Xie","doi":"10.1109/MASCOTS.2013.29","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.29","url":null,"abstract":"Write amplification brings endurance challenges to NAND Flash-based solid state disks (SSDs) such as impacts upon their write endurance and lifetime. A large write amplification degrades program/erase cycles (P/Es) of NAND Flashes and reduces the endurance and performance of SSDs. The write amplification problem is mainly triggered by garbage collections, wear-leveling, metadata updates, and mapping table updates. Write amplification is defined as the ratio of data volume written by an SSD controller to data volume written by a host. In this paper, we propose a four-level model of write amplification for SSDs. The four levels considered in our model include the channel level, chip level, die level, and plane level. In light of this model, we design a method of analyzing write amplification of SSDs to trace SSD endurance and performance by incorporating the Ready/Busy (R/B) signal of NAND Flash. Our practical approach aims to measure the value of write amplification for an entire SSD rather than NAND Flashes. To validate our measurement technique and model, we implement a verified SSD (vSSD) system and perform a cross-comparison on a set of SSDs, which are stressed by micro-benchmarks and I/O traces. A new method for SSDs is adopted in our measurements to study the R/B signals of NAND Flashes in an SSD. Experimental results show that our model is accurate and the measurement technique is generally applicable to any SSDs.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121749657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Symbolic Solution of Kronecker-Based Structured Markovian Models 基于kronecker的结构化马尔可夫模型的符号解

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.62

Paulo Fernandes, Lucelene Lopes, S. Yeralan

This paper describes a method to obtain symbolic solution of large stochastic models using Gauss-Jordan elimination. Such solution is an efficient alternative to standard simulations and it allows fast and exact solution of very large and complex models that are hard to be dealt even with iterative numerical methods. The proposed method assumes the system described as a structured (modular) Markovian system with discrete states for each system module and transitions among those states ruled by Markovian processes. The mathematical representation of such system is made by a Kronecker (Tensor) formula, i.e., a tensor formulation of small matrices representing each system module transitions and occasional dependencies among modules. Preliminary results of the proposed solution indicate the expected efficiency of the proposed solution.

本文描述了一种利用高斯-约当消去法求大型随机模型符号解的方法。这种解是标准模拟的一种有效的替代方案，它可以快速准确地解决即使是迭代数值方法也难以处理的非常大和复杂的模型。该方法将系统描述为一个结构化(模块化)马尔可夫系统，每个系统模块具有离散状态，并且这些状态之间的转换由马尔可夫过程控制。这种系统的数学表示由Kronecker(张量)公式表示，即小矩阵的张量公式表示每个系统模块的转换和模块之间的偶尔依赖。该方案的初步结果表明了该方案的预期效率。

引用次数: 1

A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems 分区对多核系统并行仿真的影响研究

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.55

Zhenjiang Dong, Jun Wang, G. Riley, S. Yalamanchili

There has been little research that studies the effect of partitioning on parallel simulation of multicore systems. This paper presents our study of this important problem in the context of Null-message-based synchronization algorithm for parallel multicore simulation. This paper focuses on coarse grain parallel simulation where each core and its cache slices are modeled within a single logical process (LP) and different partitioning schemes are only applied to the interconnection network. In this paper we show that encapsulating the entire on-chip interconnection network into a single logical process is an impediment to scalable simulation. This baseline partitioning and two other schemes are investigated. Experiments are conducted on a subset of the PARSEC benchmarks with 16-, 32-, 64- and 128-core models. Results show that the partitioning scheme has a significant impact on simulation performance and parallel efficiency. Beyond a certain system scale, one scheme consistently outperforms the other two schemes, and the performance as well as efficiency gaps increases as the size of the model increases - with up to 4.1 times faster speed and 277% better efficiency for 128-core models. We explain the reasons for this behavior, which can be traced to the features of the Null-message-based synchronization algorithm. Because of this, we believe that, if a component has increasing number of inter-LP interactions with increasing system size, such components should be partitioned into several sub-components to achieve better performance.

关于分区对多核系统并行仿真影响的研究很少。本文在基于空消息的并行多核仿真同步算法的背景下，对这一重要问题进行了研究。本文的重点是粗粒度并行仿真，其中每个核心及其缓存片在单个逻辑进程(LP)中建模，并且仅在互连网络中应用不同的分区方案。在本文中，我们表明将整个片上互连网络封装到单个逻辑过程中是可扩展仿真的障碍。研究了这种基线划分和另外两种方案。实验在PARSEC基准测试的一个子集上进行，包括16核、32核、64核和128核模型。结果表明，划分方案对仿真性能和并行效率有显著影响。在一定的系统规模之外，一种方案始终优于其他两种方案，并且随着模型规模的增加，性能和效率差距也会增加——128核模型的速度提高了4.1倍，效率提高了277%。我们解释了这种行为的原因，这可以追溯到基于null消息的同步算法的特性。因此，我们认为，如果一个组件随着系统规模的增加而具有越来越多的lp间交互，则应该将这些组件划分为几个子组件以获得更好的性能。

{"title":"A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems","authors":"Zhenjiang Dong, Jun Wang, G. Riley, S. Yalamanchili","doi":"10.1109/MASCOTS.2013.55","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.55","url":null,"abstract":"There has been little research that studies the effect of partitioning on parallel simulation of multicore systems. This paper presents our study of this important problem in the context of Null-message-based synchronization algorithm for parallel multicore simulation. This paper focuses on coarse grain parallel simulation where each core and its cache slices are modeled within a single logical process (LP) and different partitioning schemes are only applied to the interconnection network. In this paper we show that encapsulating the entire on-chip interconnection network into a single logical process is an impediment to scalable simulation. This baseline partitioning and two other schemes are investigated. Experiments are conducted on a subset of the PARSEC benchmarks with 16-, 32-, 64- and 128-core models. Results show that the partitioning scheme has a significant impact on simulation performance and parallel efficiency. Beyond a certain system scale, one scheme consistently outperforms the other two schemes, and the performance as well as efficiency gaps increases as the size of the model increases - with up to 4.1 times faster speed and 277% better efficiency for 128-core models. We explain the reasons for this behavior, which can be traced to the features of the Null-message-based synchronization algorithm. Because of this, we believe that, if a component has increasing number of inter-LP interactions with increasing system size, such components should be partitioned into several sub-components to achieve better performance.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129861004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DeltaNC: Efficient File Updates for Network-Coding-Based Cloud Storage Systems DeltaNC:基于网络编码的云存储系统的高效文件更新

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.52

M. R. Zakerinasab, Mea Wang

In recent years, cloud storage systems have emerged as the primary solution for online storage and information sharing. Due to efficient storage and bandwidth utilization, the use of erasure codes and network coding is proven to effectively provide fault tolerance and fast content retrieval in cloud storage systems. In a nutshell, coded blocks are distributed among storage nodes, and file retrieval is accomplished by downloading sufficient coded blocks from any group of storage nodes. However, due to high correlation between coded blocks and the original file, even a single-byte update invalidates all coded blocks in the system. In this paper, we introduce DeltaNC, a new differential update algorithm that keeps all coded blocks in a network-coding-based cloud storage system synchronized by transmitting only the changes in the file. Our experimental results, from a trace-driven simulator, show that DeltaNC significantly reduces the bandwidth and CPU usage and its performance is comparable to that offered by the Diff program, the common tool for updating files.

近年来，云存储系统已成为在线存储和信息共享的主要解决方案。由于存储和带宽的有效利用，使用擦除码和网络编码被证明可以有效地提供云存储系统的容错和快速内容检索。简而言之，编码块分布在存储节点之间，通过从任何一组存储节点下载足够的编码块来完成文件检索。然而，由于编码块与原始文件之间的高度相关性，即使是单字节更新也会使系统中的所有编码块无效。在本文中，我们介绍了DeltaNC，一种新的差分更新算法，它通过仅传输文件中的更改来保持基于网络编码的云存储系统中所有编码块的同步。我们在跟踪驱动模拟器上的实验结果表明，DeltaNC显著降低了带宽和CPU使用率，其性能可与Diff程序(用于更新文件的常用工具)提供的性能相媲美。

引用次数: 1

Path Extension Analysis of Peer-to-Peer Communications in Small 6LoWPAN/RPL Sensor Networks 小型6LoWPAN/RPL传感器网络中对等通信的路径扩展分析

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Pub Date : 2013-08-14 DOI: 10.1109/MASCOTS.2013.40

F. Melakessou, T. Engel

Researchers and manufacturers are currently putting a lot of efforts to design, improve and deploy the Internet of Things, involving a significant number of constrained and low cost embedded devices deployed in large scales with low power consumption, low bandwidth and limited communication range. For instance we can easily build a network composed by multiple sensors distributed in a building in order to monitor temperature in different offices. This kind of architecture is generally centralized as all sensors are mainly programmed to periodically transmit their data to the sink. The specific IPv6 Routing Protocol for Low-power and Lossy Networks (RPL) had been designed in order to enable such communications. Support for point-to-point traffic is also available. In fact new applications may also consider peer-to-peer communications between any nodes of the network. In that case, RPL is not optimal as data packets are forwarded in respect with longer paths with larger metrics. In this paper we propose to study the effectiveness of RPL compared to a shortest path algorithm such like the Dijkstra's algorithm. We suggest to analyze peer-to-peer communications inside random wireless sensor network topologies with size limited to 250 nodes, corresponding to a reasonable cluster size. We have built a particular simulation environment named Network Analysis and Routing eVALuation (NARVAL). This toolbox permits to generate random topologies in order to study the impact of routing algorithms on the effectiveness of communication protocols. In our work, we first generated many random network topologies where we selected a sink node. We built the Destination Oriented Directed Acyclic Graph (DODAG) from the chosen sink in respect with the RPL algorithm. We finally performed all paths between each couple of two distinct sensor nodes and compared them to the corresponding shortest paths obtained by the Dijkstra's algorithm. This approach permits to retrieve some statistics on the path extension between RPL and the Dijkstra's algorithm. We also analyzed the impact of the sink position and the network size on this path extension.

研究人员和制造商目前正在努力设计，改进和部署物联网，涉及大量低功耗，低带宽和有限通信范围大规模部署的受限和低成本嵌入式设备。例如，我们可以很容易地建立一个由分布在建筑物中的多个传感器组成的网络，以监测不同办公室的温度。这种架构通常是集中式的，因为所有传感器主要被编程为定期将其数据传输到接收器。专门为低功耗和有损网络设计的IPv6路由协议(RPL)就是为了实现这种通信。对点对点通信的支持也可用。实际上，新的应用程序也可以考虑在网络的任何节点之间进行点对点通信。在这种情况下，RPL不是最优的，因为数据包是根据更长的路径和更大的度量来转发的。在本文中，我们建议研究RPL与最短路径算法(如Dijkstra算法)的有效性。我们建议分析随机无线传感器网络拓扑内部的点对点通信，其大小限制在250个节点，对应于合理的簇大小。我们建立了一个特殊的仿真环境，命名为网络分析和路由评估(NARVAL)。这个工具箱允许生成随机拓扑，以便研究路由算法对通信协议有效性的影响。在我们的工作中，我们首先生成了许多随机网络拓扑，并选择了一个汇聚节点。在RPL算法方面，我们从选择的sink中构建了目的地导向有向无环图(DODAG)。最后，我们执行了两个不同传感器节点的每一对之间的所有路径，并将它们与Dijkstra算法获得的相应最短路径进行了比较。这种方法允许检索RPL和Dijkstra算法之间的路径扩展的一些统计信息。我们还分析了汇聚位置和网络大小对路径扩展的影响。

{"title":"Path Extension Analysis of Peer-to-Peer Communications in Small 6LoWPAN/RPL Sensor Networks","authors":"F. Melakessou, T. Engel","doi":"10.1109/MASCOTS.2013.40","DOIUrl":"https://doi.org/10.1109/MASCOTS.2013.40","url":null,"abstract":"Researchers and manufacturers are currently putting a lot of efforts to design, improve and deploy the Internet of Things, involving a significant number of constrained and low cost embedded devices deployed in large scales with low power consumption, low bandwidth and limited communication range. For instance we can easily build a network composed by multiple sensors distributed in a building in order to monitor temperature in different offices. This kind of architecture is generally centralized as all sensors are mainly programmed to periodically transmit their data to the sink. The specific IPv6 Routing Protocol for Low-power and Lossy Networks (RPL) had been designed in order to enable such communications. Support for point-to-point traffic is also available. In fact new applications may also consider peer-to-peer communications between any nodes of the network. In that case, RPL is not optimal as data packets are forwarded in respect with longer paths with larger metrics. In this paper we propose to study the effectiveness of RPL compared to a shortest path algorithm such like the Dijkstra's algorithm. We suggest to analyze peer-to-peer communications inside random wireless sensor network topologies with size limited to 250 nodes, corresponding to a reasonable cluster size. We have built a particular simulation environment named Network Analysis and Routing eVALuation (NARVAL). This toolbox permits to generate random topologies in order to study the impact of routing algorithms on the effectiveness of communication protocols. In our work, we first generated many random network topologies where we selected a sink node. We built the Destination Oriented Directed Acyclic Graph (DODAG) from the chosen sink in respect with the RPL algorithm. We finally performed all paths between each couple of two distinct sensor nodes and compared them to the corresponding shortest paths obtained by the Dijkstra's algorithm. This approach permits to retrieve some statistics on the path extension between RPL and the Dijkstra's algorithm. We also analyzed the impact of the sink position and the network size on this path extension.","PeriodicalId":385538,"journal":{"name":"2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127605477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1