2012 IEEE 31st Symposium on Reliable Distributed Systems最新文献

英文中文

Byzantine Fault-Tolerant Publish/Subscribe: A Cloud Computing Infrastructure 拜占庭式容错发布/订阅:一种云计算基础设施

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.14

Tiancheng Chang, H. Meling

The emerging publish/subscribe communication paradigm for building large-scale distributed event notification systems, has been shown to exhibit excellent performance and scalability characteristics. Moreover, some work also focus on providing reliability and availability guarantees in the face of node crash and link failures. Such publish/subscribe systems are commonly used in cloud computing infrastructures. However, addressing the dependability concern due to malicious attacks or unintentional software errors, which can potentially corrupt the system, has largely been left untouched by researchers. In this paper, we first identify some of the potential problem areas related to Byzantine behavior in the publish/subscribe paradigm. Secondly, we propose several directions of research for designing a Byzantine fault-tolerant publish/subscribe system suitable for use as a cloud computing infrastructure.

用于构建大规模分布式事件通知系统的新兴发布/订阅通信范例已被证明具有出色的性能和可伸缩性特征。此外，一些工作还侧重于在面对节点崩溃和链路故障时提供可靠性和可用性保证。这种发布/订阅系统通常用于云计算基础设施。然而，由于恶意攻击或无意的软件错误而引起的可靠性问题，可能会破坏系统，研究人员在很大程度上没有触及。在本文中，我们首先确定了与发布/订阅范式中的拜占庭行为相关的一些潜在问题领域。其次，我们提出了设计适合用作云计算基础设施的拜占庭容错发布/订阅系统的几个研究方向。

引用次数: 13

Efficient and Reliable Multicast in Multi-radio Networks 多无线网络中高效可靠的组播

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.22

R. Friedman, Alex Kogan

This paper investigates a novel efficient approach to utilize multiple radio interfaces for enhancing the performance of reliable multicasts from a single sender to a group of receivers. In the proposed scheme, one radio channel (and interface) is dedicated only for recovery information transmissions. We apply this concept to both ARQ and hybrid ARQ+FEC protocols, formally analyzing the number of packets each receiver needs to process in both our approach and in the common single channel approach. We also present a corresponding efficient protocol, and study its performance by simulation. Both the formal analysis and the simulations demonstrate the benefits of our scheme.

本文研究了一种利用多个无线电接口来提高从单个发送方到一组接收方的可靠组播性能的有效方法。在该方案中，一个无线信道(和接口)仅用于恢复信息传输。我们将这一概念应用于ARQ和混合ARQ+FEC协议，正式分析每个接收器在我们的方法和普通单通道方法中需要处理的数据包数量。提出了相应的高效协议，并对其性能进行了仿真研究。形式分析和仿真都证明了该方案的优越性。

引用次数: 3

Strategies for Reliable, Cloud-Based Distributed Real-Time and Embedded Systems 可靠的、基于云的分布式实时和嵌入式系统策略

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.69

Kyoungho An

Cloud computing enables elastic and dynamic resource provisioning while providing cost-effective computing solutions. However, while cloud computing provides customers access to scalable and elastic resources, it does not guarantee the user's expectations of Quality of Service (QoS). This is because a number of customers share resources in the cloud infrastructure simultaneously: compute-intensive processes and network traffic associated with one customer often impact the performance of other applications operated on the same infrastructure in unexpected ways. The inability of the cloud to enforce QoS and provide execution guarantees prevents cloud computing from becoming useful for distributed, real-time and embedded (DRE) systems. Providing the required levels of service to support DRE systems in the cloud is complicated for a variety of reasons: (1) lack of effective monitoring that prevents timely auto-scaling needed for DRE systems, (2) hyper visors and data-center networks that do not support real-time scheduling of resources, and (3) absence of efficient and predictable fault tolerant mechanisms with acceptable overhead and consistency. This paper describes ongoing and proposed doctoral research to address these challenges.

云计算支持弹性和动态的资源配置，同时提供经济高效的计算解决方案。然而，虽然云计算为客户提供可伸缩和弹性资源的访问，但它不能保证用户对服务质量(QoS)的期望。这是因为许多客户同时共享云基础设施中的资源:与一个客户关联的计算密集型进程和网络流量通常会以意想不到的方式影响在同一基础设施上运行的其他应用程序的性能。云无法执行QoS和提供执行保证，这阻碍了云计算在分布式、实时和嵌入式(DRE)系统中变得有用。由于多种原因，提供所需的服务级别以支持云中的DRE系统是复杂的:(1)缺乏有效的监控，这妨碍了DRE系统所需的及时自动扩展;(2)虚拟监控程序和数据中心网络不支持资源的实时调度;(3)缺乏具有可接受开销和一致性的高效和可预测的容错机制。本文描述了正在进行的和拟议的博士研究，以解决这些挑战。

{"title":"Strategies for Reliable, Cloud-Based Distributed Real-Time and Embedded Systems","authors":"Kyoungho An","doi":"10.1109/SRDS.2012.69","DOIUrl":"https://doi.org/10.1109/SRDS.2012.69","url":null,"abstract":"Cloud computing enables elastic and dynamic resource provisioning while providing cost-effective computing solutions. However, while cloud computing provides customers access to scalable and elastic resources, it does not guarantee the user's expectations of Quality of Service (QoS). This is because a number of customers share resources in the cloud infrastructure simultaneously: compute-intensive processes and network traffic associated with one customer often impact the performance of other applications operated on the same infrastructure in unexpected ways. The inability of the cloud to enforce QoS and provide execution guarantees prevents cloud computing from becoming useful for distributed, real-time and embedded (DRE) systems. Providing the required levels of service to support DRE systems in the cloud is complicated for a variety of reasons: (1) lack of effective monitoring that prevents timely auto-scaling needed for DRE systems, (2) hyper visors and data-center networks that do not support real-time scheduling of resources, and (3) absence of efficient and predictable fault tolerant mechanisms with acceptable overhead and consistency. This paper describes ongoing and proposed doctoral research to address these challenges.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129827702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Quantitative Comparison of Reactive and Proactive Replicated Storage Systems 被动和主动复制存储系统的定量比较

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.1

Rossana Motta, J. Pasquale

Replicated storage systems allow their stored data objects to outlive the life of the nodes storing them through replication. In this paper, we focus on durability, and more specifically on the concept of an object's lifetime, i.e., the duration of time between the creation of an object and when it is permanently irretrievable from the system. We analyze two main replication strategies: reactive, in which replication occurs in response to failures, and proactive, in which replication occurs in anticipation of failures. Our work presents a quantitative analysis that compares reactive and proactive through analytical models and simulations, considering exponentially distributed failures and reactive repairs, and periodic proactive replications. We also present a derivation of the analytical formula for the variance of the lifetime in the reactive model. Our results indicate that a proactive strategy leads to multiple times higher storage requirements than a reactive strategy. In addition, reactive systems are only moderately bursty in terms of bandwidth consumption, with rare peaks of at most five times the bandwidth consumption in proactive systems (given input parameter values that are compatible with real systems). Finally, for both strategies, the standard deviation is very close to the expected lifetime, and consequently, the lifetimes close to being exponentially distributed.

通过复制，复制存储系统允许其存储的数据对象比存储它们的节点的寿命更长。在本文中，我们关注持久性，更具体地说，关注对象生命周期的概念，即对象从创建到永远无法从系统中恢复之间的持续时间。我们分析了两种主要的复制策略:被动复制，其中复制发生在故障响应中;主动复制，其中复制发生在故障预期中。我们的工作通过分析模型和模拟进行了定量分析，比较了被动和主动，考虑了指数分布的故障和被动修复，以及周期性的主动复制。我们还推导了反应模型中寿命方差的解析公式。我们的研究结果表明，主动策略导致的存储需求比被动策略高出数倍。此外，被动系统在带宽消耗方面只有适度的突发，其峰值最多是主动系统带宽消耗的五倍(给定与实际系统兼容的输入参数值)。最后，对于这两种策略，标准差非常接近预期寿命，因此，寿命接近指数分布。

{"title":"A Quantitative Comparison of Reactive and Proactive Replicated Storage Systems","authors":"Rossana Motta, J. Pasquale","doi":"10.1109/SRDS.2012.1","DOIUrl":"https://doi.org/10.1109/SRDS.2012.1","url":null,"abstract":"Replicated storage systems allow their stored data objects to outlive the life of the nodes storing them through replication. In this paper, we focus on durability, and more specifically on the concept of an object's lifetime, i.e., the duration of time between the creation of an object and when it is permanently irretrievable from the system. We analyze two main replication strategies: reactive, in which replication occurs in response to failures, and proactive, in which replication occurs in anticipation of failures. Our work presents a quantitative analysis that compares reactive and proactive through analytical models and simulations, considering exponentially distributed failures and reactive repairs, and periodic proactive replications. We also present a derivation of the analytical formula for the variance of the lifetime in the reactive model. Our results indicate that a proactive strategy leads to multiple times higher storage requirements than a reactive strategy. In addition, reactive systems are only moderately bursty in terms of bandwidth consumption, with rare peaks of at most five times the bandwidth consumption in proactive systems (given input parameter values that are compatible with real systems). Finally, for both strategies, the standard deviation is very close to the expected lifetime, and consequently, the lifetimes close to being exponentially distributed.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128259819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy Efficient Hadoop Using Mirrored Data Block Replication Policy 节能Hadoop使用镜像数据块复制策略

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.25

Sara Arbab Yazd, S. Venkatesan, N. Mittal

MapReduce scheme has became the state of the art in parallel processing of vast amount of data in distributed systems. Hadoop, as a popular open-source implementation of this technique, makes use of data block replication mechanism to provide a reliable and fault-tolerant design. To maintain data availability, Hadoop takes into account the possibilities of node and rack failures. Hence, it stores multiple copies of each data block to ensure availability and reliability. The current data block placement policy is to randomly distribute the replicas on all servers, satisfying some constraints such as preventing storage of two replicas of a data block on a single node. Our study proposes an efficient placement policy for data block replicas, which can reduce the consumed energy in data centers. The proposed policy is built upon the covering subset (CovSet) method. The effectiveness of the proposed approach is confirmed through simulations. Also, our experiments show that the proposed method becomes more effective whenever the average number of data blocks per server increases, which corresponds to the actual conditions in practice.

MapReduce方案已经成为分布式系统中并行处理大量数据的最新技术。Hadoop作为该技术的流行开源实现，利用数据块复制机制提供可靠和容错的设计。为了维护数据的可用性，Hadoop考虑了节点和机架故障的可能性。因此，它存储每个数据块的多个副本，以确保可用性和可靠性。当前的数据块放置策略是在所有服务器上随机分布副本，以满足一些约束，例如防止在单个节点上存储数据块的两个副本。我们的研究提出了一种有效的数据块副本放置策略，可以减少数据中心的能源消耗。建议的策略建立在覆盖子集(CovSet)方法之上。通过仿真验证了该方法的有效性。实验表明，随着每台服务器平均数据块数量的增加，所提出的方法变得更加有效，这符合实践中的实际情况。

引用次数: 1

Impact of Operational Reliability Re-assessment during Aircraft Missions 飞机任务中运行可靠性再评估的影响

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.37

Kossi Tiassou, K. Kanoun, M. Kaâniche, C. Seguin, Chris Papadopoulos

This paper addresses an aircraft mission operational reliability as resulting from component failures, environment changes, and maintenance facilities offered at the various stops involved in the mission. We will show how the on-line assessment of operational reliability will help adjust an aircraft mission, in case of major changes to equipment availability during the mission. The assessment is made possible thanks to the building and validation of a generic dependability model that is easily i) processed for the assignment of an initial mission, and ii) updated during mission accomplishment, following the occurrence of some specific major events. The generic model can be built as early as the design phase, by engineers who are specialist in dependability assessment, based on stochastic processes. Model update and processing, during aircraft operation, can be achieved by operators who are not necessarily familiar with stochastic processes in the way that they are being applied in this research. We will present examples of results that show the valuable role of operational dependability re-assessment during aircraft mission.

本文讨论了飞机任务运行可靠性是由部件故障、环境变化和在任务中涉及的各个站点提供的维护设施造成的。我们将展示在任务期间设备可用性发生重大变化的情况下，运行可靠性的在线评估将如何帮助调整飞机任务。之所以能够进行评估，是因为建立和验证了一个通用的可靠性模型，该模型易于i)在分配初始任务时进行处理，ii)在任务完成期间，在发生一些具体的重大事件后进行更新。通用模型可以早在设计阶段就由专门从事可靠性评估的工程师基于随机过程建立。在飞机运行过程中，模型更新和处理可以由不一定熟悉随机过程的操作人员来完成，就像在本研究中应用的那样。我们将介绍一些结果的例子，这些结果表明在飞机任务中操作可靠性再评估的重要作用。

{"title":"Impact of Operational Reliability Re-assessment during Aircraft Missions","authors":"Kossi Tiassou, K. Kanoun, M. Kaâniche, C. Seguin, Chris Papadopoulos","doi":"10.1109/SRDS.2012.37","DOIUrl":"https://doi.org/10.1109/SRDS.2012.37","url":null,"abstract":"This paper addresses an aircraft mission operational reliability as resulting from component failures, environment changes, and maintenance facilities offered at the various stops involved in the mission. We will show how the on-line assessment of operational reliability will help adjust an aircraft mission, in case of major changes to equipment availability during the mission. The assessment is made possible thanks to the building and validation of a generic dependability model that is easily i) processed for the assignment of an initial mission, and ii) updated during mission accomplishment, following the occurrence of some specific major events. The generic model can be built as early as the design phase, by engineers who are specialist in dependability assessment, based on stochastic processes. Model update and processing, during aircraft operation, can be achieved by operators who are not necessarily familiar with stochastic processes in the way that they are being applied in this research. We will present examples of results that show the valuable role of operational dependability re-assessment during aircraft mission.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132601557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Efficient Asynchronous Low Power Listening for Wireless Sensor Networks 无线传感器网络的高效异步低功耗监听

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.23

R. Panta, James A. Pelletier, Gregg T. Vesonder

Energy conservation and reliability of wireless communications are two crucial requirements of practical sensor networks. Radio duty cycling is a widely used mechanism to reduce energy consumption of sensor devices and to increase the lifetime of the network. A side effect of radio duty cycling is that it can cause the wireless communications to be unreliable---if a sender node transmits a packet while the receiver is asleep, the communication fails. Early duty cycling protocols like B-MAC that were designed for bit streaming radios achieve low duty cycle by keeping the radio transceiver awake for short time periods. However, they require a transmitter node to precede a packet transmission with a long preamble to ensure the reliability of wireless communication. Furthermore, they cannot be used with modern packet radios like widely used IEEE 802.15.4 based radio transceivers, which cannot transmit arbitrarily long preambles. Recent duty cycling schemes like X-MAC, on the other hand, reduce the length of the preamble and are designed to work with packet radios. However, in order to ensure that a receiver can reliably detect a transmitter's preamble transmission, these schemes need to turn the radio transceiver on for longer time durations than the early schemes like B-MAC. In this paper, we present a novel duty cycling scheme called Quick MAC, that achieves a very low duty cycle without compromising the reliability of wireless communication. Furthermore, Quick MAC is stateless, compatible with packet (and bit stream) radios, and does not require synchronization among sensor nodes. From our experiments using TMote sky motes, we show that Quick MAC reduces duty cycle by a factor of about 4 compared to X-MAC, and yet maintains the same level of reliability of wireless communication as X-MAC.

无线通信的节能和可靠性是实际传感器网络的两个关键要求。无线电占空比是一种广泛使用的机制，以减少传感器设备的能量消耗，提高网络的寿命。无线电占空比的一个副作用是，它可能导致无线通信不可靠——如果发送方节点在接收方处于休眠状态时传输数据包，通信就会失败。早期的占空比协议，如B-MAC，是为比特流无线电设计的，通过保持无线电收发器在短时间内处于唤醒状态来实现低占空比。但是，为了保证无线通信的可靠性，它们需要在数据包传输之前有一个发送节点，并有一个很长的前导。此外，它们不能与现代分组无线电一起使用，比如广泛使用的基于IEEE 802.15.4的无线电收发器，后者不能传输任意长的序文。另一方面，最近的占空比方案，如X-MAC，减少了序言的长度，并设计用于分组无线电。然而，为了确保接收器能够可靠地检测到发射器的前置传输，这些方案需要打开无线电收发器的持续时间比早期的方案(如B-MAC)更长。在本文中，我们提出了一种新的占空比方案，称为快速MAC，在不影响无线通信可靠性的情况下实现了非常低的占空比。此外，快速MAC是无状态的，与分组(和比特流)无线电兼容，并且不需要传感器节点之间的同步。从我们使用TMote sky motes的实验中，我们发现Quick MAC与X-MAC相比减少了约4倍的占空比，并且保持了与X-MAC相同水平的无线通信可靠性。

{"title":"Efficient Asynchronous Low Power Listening for Wireless Sensor Networks","authors":"R. Panta, James A. Pelletier, Gregg T. Vesonder","doi":"10.1109/SRDS.2012.23","DOIUrl":"https://doi.org/10.1109/SRDS.2012.23","url":null,"abstract":"Energy conservation and reliability of wireless communications are two crucial requirements of practical sensor networks. Radio duty cycling is a widely used mechanism to reduce energy consumption of sensor devices and to increase the lifetime of the network. A side effect of radio duty cycling is that it can cause the wireless communications to be unreliable---if a sender node transmits a packet while the receiver is asleep, the communication fails. Early duty cycling protocols like B-MAC that were designed for bit streaming radios achieve low duty cycle by keeping the radio transceiver awake for short time periods. However, they require a transmitter node to precede a packet transmission with a long preamble to ensure the reliability of wireless communication. Furthermore, they cannot be used with modern packet radios like widely used IEEE 802.15.4 based radio transceivers, which cannot transmit arbitrarily long preambles. Recent duty cycling schemes like X-MAC, on the other hand, reduce the length of the preamble and are designed to work with packet radios. However, in order to ensure that a receiver can reliably detect a transmitter's preamble transmission, these schemes need to turn the radio transceiver on for longer time durations than the early schemes like B-MAC. In this paper, we present a novel duty cycling scheme called Quick MAC, that achieves a very low duty cycle without compromising the reliability of wireless communication. Furthermore, Quick MAC is stateless, compatible with packet (and bit stream) radios, and does not require synchronization among sensor nodes. From our experiments using TMote sky motes, we show that Quick MAC reduces duty cycle by a factor of about 4 compared to X-MAC, and yet maintains the same level of reliability of wireless communication as X-MAC.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133232819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

RADAR: Adaptive Rate Allocation in Distributed Stream Processing Systems under Bursty Workloads 雷达:突发工作负载下分布式流处理系统的自适应速率分配

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.55

Ioannis Boutsis, V. Kalogeraki

In the recent years we have witnessed a proliferation of distributed stream processing systems that need to operate under bursty workloads. Examples include road traffic control, processing of financial feeds, network monitoring and real-time sensor data analysis systems. Meeting the QoS requirements of the stream processing systems under burstiness is a challenging process. In this paper we present our approach for adaptive rate allocation within the distributed stream processing system to meet the end-to-end execution time and rate demands of the applications. Our algorithm determines the rates of the application components at runtime, with respect to the QoS constraints, to compensate for delays experienced by the components or to react to sudden bursts of load. Our technique is distributed and low-cost. Our detailed experimental results over our Synergy middleware illustrate that our approach is practical, depicts good performance and has low resource overhead.

近年来，我们目睹了需要在突发工作负载下运行的分布式流处理系统的激增。例子包括道路交通控制、金融信息处理、网络监控和实时传感器数据分析系统。满足突发流处理系统的QoS要求是一个具有挑战性的过程。本文提出了分布式流处理系统中自适应速率分配的方法，以满足应用程序端到端的执行时间和速率需求。我们的算法在运行时根据QoS约束确定应用程序组件的速率，以补偿组件经历的延迟或对突然的负载突发做出反应。我们的技术是分布式和低成本的。我们在Synergy中间件上的详细实验结果表明，我们的方法是实用的，具有良好的性能和较低的资源开销。

引用次数: 13

Fair Comparison of Gossip Algorithms over Large-Scale Random Topologies 大规模随机拓扑上八卦算法的公平比较

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.28

Ruijing Hu, Julien Sopena, L. Arantes, Pierre Sens, I. Demeure

We present a thorough performance comparison of three widely used probabilistic gossip algorithms over well-known random graphs. These graphs represent some large-scale network topologies: Bernoulli (or Erdos-Rényi) graph, random geometric graph, and scale-free graph. In order to conduct such a fair comparison, particularly in terms of reliability, we propose a new parameter, called effectual fan out. For a given topology and gossip algorithm, the effectual fan out characterizes the mean dissemination power of infected sites. For large-scale networks, the effectual fan out has thus a strong linear correlation with message complexity. It enables to make an accurate analysis of the behavior of a gossip algorithm over a topology. Furthermore, it simplifies the theoretical comparison of different gossip algorithms on the topology. Based on extensive experiments on top of OMNet++ simulator, which make use of the effectual fan out, we discuss the impact of topologies and gossip algorithms on performance, and how to combine them to have the best gain in terms of reliability.

我们提出了一个全面的性能比较三种广泛使用的概率八卦算法在众所周知的随机图。这些图表示了一些大规模的网络拓扑:伯努利图、随机几何图和无标度图。为了进行这样一个公平的比较，特别是在可靠性方面，我们提出了一个新的参数，称为有效扇出。对于给定的拓扑结构和八卦算法，有效扇出表征了受感染站点的平均传播能力。对于大规模网络，有效扇出与消息复杂度有很强的线性相关性。它能够对拓扑上的八卦算法的行为进行准确的分析。进一步简化了拓扑上不同八卦算法的理论比较。在利用有效扇出的omnet++模拟器上进行了大量的实验，讨论了拓扑和八卦算法对性能的影响，以及如何将它们结合起来以获得最佳的可靠性增益。

引用次数: 15

Availability-Based Methods for Distributed Storage Systems 基于可用性的分布式存储系统方法

2012 IEEE 31st Symposium on Reliable Distributed Systems

Pub Date : 2012-10-08 DOI: 10.1109/SRDS.2012.10

Anne-Marie Kermarrec, E. L. Merrer, G. Straub, Alexandre van Kempen

Distributed storage systems rely heavily on redundancy to ensure data availability as well as durability. In networked systems subject to intermittent node unavailability, the level of redundancy introduced in the system should be minimized and maintained upon failures. Repairs are well-known to be extremely bandwidth-consuming and it has been shown that, without care, they may significantly congest the system. In this paper, we propose an approach to redundancy management accounting for nodes heterogeneity with respect to availability. We show that by using the availability history of nodes, the performance of two important faces of distributed storage (replica placement and repair) can be significantly improved. Replica placement is achieved based on complementary nodes with respect to nodes availability, improving the overall data availability. Repairs can be scheduled thanks to an adaptive per-node timeout according to node availability, so as to decrease the number of repairs while reaching comparable availability. We propose practical heuristics for those two issues. We evaluate our approach through extensive simulations based on real and well-known availability traces. Results clearly show the benefits of our approach with regards to the critical trade-off between data availability, load-balancing and bandwidth consumption.

分布式存储系统在很大程度上依赖于冗余来确保数据的可用性和持久性。在节点间歇性不可用的网络系统中，系统中引入的冗余级别应该最小化，并在故障时保持。众所周知，修复非常消耗带宽，并且已经证明，如果不小心，它们可能会严重阻塞系统。在本文中，我们提出了一种考虑节点可用性异质性的冗余管理方法。我们表明，通过使用节点的可用性历史，分布式存储的两个重要方面(副本放置和修复)的性能可以得到显著提高。副本放置是基于节点可用性方面的互补节点来实现的，从而提高了整体数据可用性。由于可以根据节点可用性设置自适应的每个节点超时，因此可以安排维修，从而减少维修数量，同时达到可比较的可用性。我们针对这两个问题提出了一些实用的启发方法。我们通过基于真实和众所周知的可用性跟踪的广泛模拟来评估我们的方法。结果清楚地显示了我们的方法在数据可用性、负载平衡和带宽消耗之间的关键权衡方面的好处。

{"title":"Availability-Based Methods for Distributed Storage Systems","authors":"Anne-Marie Kermarrec, E. L. Merrer, G. Straub, Alexandre van Kempen","doi":"10.1109/SRDS.2012.10","DOIUrl":"https://doi.org/10.1109/SRDS.2012.10","url":null,"abstract":"Distributed storage systems rely heavily on redundancy to ensure data availability as well as durability. In networked systems subject to intermittent node unavailability, the level of redundancy introduced in the system should be minimized and maintained upon failures. Repairs are well-known to be extremely bandwidth-consuming and it has been shown that, without care, they may significantly congest the system. In this paper, we propose an approach to redundancy management accounting for nodes heterogeneity with respect to availability. We show that by using the availability history of nodes, the performance of two important faces of distributed storage (replica placement and repair) can be significantly improved. Replica placement is achieved based on complementary nodes with respect to nodes availability, improving the overall data availability. Repairs can be scheduled thanks to an adaptive per-node timeout according to node availability, so as to decrease the number of repairs while reaching comparable availability. We propose practical heuristics for those two issues. We evaluate our approach through extensive simulations based on real and well-known availability traces. Results clearly show the benefits of our approach with regards to the critical trade-off between data availability, load-balancing and bandwidth consumption.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130468739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE 31st Symposium on Reliable Distributed Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀