2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)最新文献

英文中文

MOve: Design of An Application-Malleable Overlay 移动:应用延展性覆盖层的设计

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.33

Sébastien Monnet, Ramsés Morales, Gabriel Antoniu, Indranil Gupta

Peer-to-peer overlays allow distributed applications to work in a wide-area, scalable, and fault-tolerant manner. However, most structured and unstructured overlays present in literature today are inflexible from the application viewpoint. In other words, the application has no control over the structure of the overlay itself. This paper proposes the concept of an application-malleable overlay, and the design of the first malleable overlay which we call MOve. In MOve, the communication characteristics of the distributed application using the overlay can influence the overlay's structure itself, with the twin goals of (1) optimizing the application performance by adapting the overlay, while also (2) retaining the large scale and fault tolerance of the overlay approach. The influence could either be explicitly specified by the application or implicitly gleaned by our algorithms. Besides neighbor list membership management, MOve also contains algorithms for resource discovery, update propagation, and churn-resistance. The emergent behavior of the implicit mechanisms used in MOve manifests in the following way: when application communication is low, most overlay links keep their default configuration; however, as application communication characteristics become more evident, the overlay gracefully adapts itself to the application

点对点覆盖允许分布式应用程序以广域、可扩展和容错的方式工作。然而，从应用程序的角度来看，目前文献中的大多数结构化和非结构化覆盖都是不灵活的。换句话说，应用程序无法控制覆盖层本身的结构。本文提出了应用可延展覆盖层的概念，并设计了第一个可延展覆盖层，我们称之为MOve。在MOve中，使用覆盖层的分布式应用程序的通信特性会影响覆盖层本身的结构，其双重目标是:(1)通过调整覆盖层来优化应用程序性能，同时(2)保留覆盖方法的大规模和容错性。影响可以由应用程序显式指定，也可以由我们的算法隐式收集。除了邻居列表成员管理之外，MOve还包含用于资源发现、更新传播和防止流失的算法。MOve中使用的隐式机制的紧急行为表现在以下方面:当应用程序通信较低时，大多数覆盖链接保持其默认配置;然而，随着应用程序通信特征变得更加明显，覆盖层会很好地适应应用程序

{"title":"MOve: Design of An Application-Malleable Overlay","authors":"Sébastien Monnet, Ramsés Morales, Gabriel Antoniu, Indranil Gupta","doi":"10.1109/SRDS.2006.33","DOIUrl":"https://doi.org/10.1109/SRDS.2006.33","url":null,"abstract":"Peer-to-peer overlays allow distributed applications to work in a wide-area, scalable, and fault-tolerant manner. However, most structured and unstructured overlays present in literature today are inflexible from the application viewpoint. In other words, the application has no control over the structure of the overlay itself. This paper proposes the concept of an application-malleable overlay, and the design of the first malleable overlay which we call MOve. In MOve, the communication characteristics of the distributed application using the overlay can influence the overlay's structure itself, with the twin goals of (1) optimizing the application performance by adapting the overlay, while also (2) retaining the large scale and fault tolerance of the overlay approach. The influence could either be explicitly specified by the application or implicitly gleaned by our algorithms. Besides neighbor list membership management, MOve also contains algorithms for resource discovery, update propagation, and churn-resistance. The emergent behavior of the implicit mechanisms used in MOve manifests in the following way: when application communication is low, most overlay links keep their default configuration; however, as application communication characteristics become more evident, the overlay gracefully adapts itself to the application","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117232493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Weakly-Persistent Causal Objects in Dynamic Distributed Systems 动态分布式系统中的弱持久因果对象

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.47

R. Baldoni, M. Malek, A. Milani, S. Piergiovanni

In the context of clients accessing a read/write shared object, persistency of a written value is a property stating that a value written into the object is always available unless overwritten by a successive write operation. This property can be easily guaranteed in a static distributed system provided that either a subset of processes implementing the object does not crash or processes can crash and then recover being able to retrieve their last state. Unfortunately the enforcing of this property in a potentially large scale and dynamic distributed system (e.g. a P2P system) is far from being trivial when considering the case in which processes implementing the object may fail or leave at any time without notifying any other process (i.e., the last state might not be retrievable). The paper introduces the notion of weak persistency that guarantees persistency of values when a system becomes quiescent (arrivals and departures subside). An implementation of a weakly-persistent object ensuring causal consistency is provided along with its correctness proof. The interest of causal consistency lies in the fact that, contrarily to atomic consistency, it can be maintained even during non-quiescent periods of the distributed system (i.e., when persistency is not guaranteed)

在客户端访问读写共享对象的上下文中，写入值的持久性是一种属性，表示写入对象的值始终可用，除非被连续的写入操作覆盖。在静态分布式系统中，可以很容易地保证此属性，前提是实现该对象的进程子集不会崩溃，或者进程可以崩溃，然后恢复并能够检索它们的最后状态。不幸的是，在一个潜在的大规模和动态分布式系统(例如P2P系统)中，当考虑到实现对象的进程可能失败或随时离开而不通知任何其他进程(即，最后的状态可能无法检索)的情况时，执行此属性远非微不足道。本文引入了弱持久性的概念，它保证了当系统处于静止状态(到达和离开减弱)时值的持久性。提供了确保因果一致性的弱持久对象的实现及其正确性证明。因果一致性的兴趣在于，与原子一致性相反，它甚至可以在分布式系统的非静态时期(即，当持久性不能保证时)保持。

{"title":"Weakly-Persistent Causal Objects in Dynamic Distributed Systems","authors":"R. Baldoni, M. Malek, A. Milani, S. Piergiovanni","doi":"10.1109/SRDS.2006.47","DOIUrl":"https://doi.org/10.1109/SRDS.2006.47","url":null,"abstract":"In the context of clients accessing a read/write shared object, persistency of a written value is a property stating that a value written into the object is always available unless overwritten by a successive write operation. This property can be easily guaranteed in a static distributed system provided that either a subset of processes implementing the object does not crash or processes can crash and then recover being able to retrieve their last state. Unfortunately the enforcing of this property in a potentially large scale and dynamic distributed system (e.g. a P2P system) is far from being trivial when considering the case in which processes implementing the object may fail or leave at any time without notifying any other process (i.e., the last state might not be retrievable). The paper introduces the notion of weak persistency that guarantees persistency of values when a system becomes quiescent (arrivals and departures subside). An implementation of a weakly-persistent object ensuring causal consistency is provided along with its correctness proof. The interest of causal consistency lies in the fact that, contrarily to atomic consistency, it can be maintained even during non-quiescent periods of the distributed system (i.e., when persistency is not guaranteed)","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129709141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

PLATO: Predictive Latency-Aware Total Ordering 预测性延迟感知总排序

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.36

M. Balakrishnan, K. Birman, Amar Phanishayee

PLATO is a predictive total ordering protocol designed for low-latency multicast in datacenters. It predicts out-of-order arrival of multicast packets by observing their inter-arrival times, and delays packets before passing them up to the application only if it believes the packets to have arrived in the wrong order. We show through experimentation on real datacenter-style networks that the inter-arrival time of consecutive packet pairs is an excellent predictor of out-of-order delivery. We evaluate an implementation of PLATO on the Emulab testbed, and show that it drives down delivery latencies by more than a factor of 2 compared to the fixed-sequencer protocol

PLATO是一种预测性总排序协议，专为数据中心中的低延迟多播而设计。它通过观察多播数据包的间隔到达时间来预测它们的无序到达，并且只有当应用程序认为数据包以错误的顺序到达时，才会在将数据包传递给应用程序之前延迟数据包。我们通过在真实数据中心式网络上的实验表明，连续数据包对的间隔到达时间是无序交付的极好预测器。我们在Emulab测试平台上评估了PLATO的实现，并表明与固定序列器协议相比，它将传递延迟降低了2倍以上

引用次数: 12

DRIFT: Efficient Message Ordering in Ad Hoc Networks Using Virtual Flooding 漂移:使用虚拟泛洪的Ad Hoc网络中的有效消息排序

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.18

Stefan Pleisch, Thomas Clouser, Mikhail Nesterenko, A. Schiper

We present DRIFT - a total order multicast algorithm for ad hoc networks with mobile or static nodes. Due to the ad hoc nature of the network, DRIFT uses flooding for message propagation. The key idea of DRIFT is virtual flooding - a way of using unrelated message streams to propagate message causality information in order to accelerate message delivery. We describe DRIFT in detail. We evaluate its performance in a simulator and in a wireless sensor network. In both cases our results demonstrate that the performance of DRIFT exceeds that of the simple total order multicast algorithm designed for wired networks, on which it is based. In simulation at scale, for certain experiment settings, DRIFT achieved speedup of several orders of magnitude

我们提出了一种适用于具有移动或静态节点的ad hoc网络的全序多播算法。由于网络的即时性，DRIFT使用泛洪进行消息传播。DRIFT的关键思想是虚拟泛洪——一种使用不相关的消息流传播消息因果关系信息以加速消息传递的方法。我们详细描述了DRIFT。我们在模拟器和无线传感器网络中对其性能进行了评估。在这两种情况下，我们的结果表明，漂移的性能超过了为有线网络设计的简单全顺序多播算法，它是基于有线网络的。在规模模拟中，对于某些实验设置，DRIFT实现了几个数量级的加速

引用次数: 2

Systematic composition and analyzability of dependable networked embedded computing systems 可靠网络化嵌入式计算系统的系统组成与可分析性

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.45

K. Kim

In the past decade, the application field of networked embedded computing systems (NECSs) has been showing gradual acceleration in its growth. A great majority of NECSs are subject to some fault tolerance requirements. In other words, in their application environments, the fault rates of some components, e.g., communication links, batteries, some chips, etc., are not negligible. So, research interests in fault-tolerant (FT) NECSs have been showing growing trends in the past decade in good contrast to the relatively stagnant research interests in more traditional branches of FTC such as FT data servers, etc. In particular, object-/component-oriented (OO/CO) structuring techniques for RT distributed computing systems have been established in highly promising forms with convincing demonstrations although they are still very slow in spreading through industry. Such OO/CO structuring techniques not only lead to efficient design of RTC application systems with considerably reduced labor and improved maintainability but also enable design of complex systems yielding analyzable and yet tight response time bounds. On the basis of such foundation for designing RTC systems, research in FT NECS can now proceed with much improved effectiveness and confidence

在过去的十年中，网络化嵌入式计算系统(NECSs)的应用领域呈现出逐步加速发展的趋势。绝大多数NECSs都有一定的容错要求。换句话说，在它们的应用环境中，一些组件，例如通信链路、电池、一些芯片等的故障率是不可忽略的。因此，在过去十年中，对容错(FT) NECSs的研究兴趣一直呈增长趋势，与FTC更传统的分支机构(如FT数据服务器等)相对停滞的研究兴趣形成鲜明对比。特别是，面向对象/组件(OO/CO)的RT分布式计算系统结构技术已经以非常有前景的形式建立起来，并有令人信服的演示，尽管它们在工业中传播的速度仍然很慢。这种OO/CO结构技术不仅可以有效地设计RTC应用系统，大大减少了人工，提高了可维护性，而且还可以使复杂系统的设计产生可分析的，但严格的响应时间界限。在这种设计RTC系统的基础上，FT NECS的研究现在可以以更高的效率和信心进行

{"title":"Systematic composition and analyzability of dependable networked embedded computing systems","authors":"K. Kim","doi":"10.1109/SRDS.2006.45","DOIUrl":"https://doi.org/10.1109/SRDS.2006.45","url":null,"abstract":"In the past decade, the application field of networked embedded computing systems (NECSs) has been showing gradual acceleration in its growth. A great majority of NECSs are subject to some fault tolerance requirements. In other words, in their application environments, the fault rates of some components, e.g., communication links, batteries, some chips, etc., are not negligible. So, research interests in fault-tolerant (FT) NECSs have been showing growing trends in the past decade in good contrast to the relatively stagnant research interests in more traditional branches of FTC such as FT data servers, etc. In particular, object-/component-oriented (OO/CO) structuring techniques for RT distributed computing systems have been established in highly promising forms with convincing demonstrations although they are still very slow in spreading through industry. Such OO/CO structuring techniques not only lead to efficient design of RTC application systems with considerably reduced labor and improved maintainability but also enable design of complex systems yielding analyzable and yet tight response time bounds. On the basis of such foundation for designing RTC systems, research in FT NECS can now proceed with much improved effectiveness and confidence","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131724803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cryptree: A Folder Tree Structure for Cryptographic File Systems 加密树加密文件系统的文件夹树结构

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.15

D. Grolimund, Luzius Meisser, S. Schmid, Roger Wattenhofer

We present Cryptree, a cryptographic tree structure which facilitates access control in file systems operating on untrusted storage. Cryptree leverages the file system's folder hierarchy to achieve efficient and intuitive, yet simple, access control. The highlights are its ability to recursively grant access to a folder and all its subfolders in constant time, the dynamic inheritance of access rights which inherently prevents scattering of access rights, and the possibility to grant someone access to a file or folder without revealing the identities of other accessors. To reason about and to visualize Cryptree, we introduce the notion of cryptographic links. We describe the Cryptrees we have used to enforce read and write access in our own file system. Finally, we measure the performance of the Cryptree and compare it to other approaches

我们介绍的 Cryptree 是一种加密树结构，它有助于对在不信任存储设备上运行的文件系统进行访问控制。Cryptree 利用文件系统的文件夹层次结构来实现高效、直观、简单的访问控制。Cryptree 的亮点在于它能在恒定时间内递归授予一个文件夹及其所有子文件夹的访问权限；访问权限的动态继承从本质上防止了访问权限的分散；以及在不泄露其他访问者身份的情况下授予某人对文件或文件夹的访问权限。为了对 Cryptree 进行推理并使其可视化，我们引入了加密链接的概念。我们将介绍我们在自己的文件系统中用于执行读写访问的 Cryptrees。最后，我们将测量 Cryptree 的性能，并将其与其他方法进行比较。

引用次数: 68

Reliably Executing Tasks in the Presence of Untrusted Entities 在不可信实体存在的情况下可靠地执行任务

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.40

Antonio Fernández, Luis López, Agustín Santos, Chryssis Georgiou

In this work we consider a distributed system formed by a master processor and a collection of n processors (workers) that can execute tasks; worker processors are untrusted and might act maliciously. The master assigns tasks to workers to be executed. Each task returns a binary value, and we want the master to accept only correct values with high probability. Furthermore, we assume that the service provided by the workers is not free; for each task that a worker is assigned, the master is charged with a work-unit. Therefore, considering a single task assigned to several workers, our goal is to have the master computer to accept the correct value of the task with high probability, with the smallest possible amount of work (number of workers the master assigns the task). We explore two ways of bounding the number of faulty processors: (a) we consider a fixed bound f < n/2 on the maximum number of workers that may fail, and (b) a probability p < 1/2 of any processor to be faulty (all processors are faulty with probability p, independently of the rest of processors). Our work demonstrates that it is possible to obtain high probability of correct acceptance with low work. In particular, by considering both mechanisms of bounding the number of malicious workers, we first show lower bounds on the minimum amount of (expected) work required, so that any algorithm accepts the correct value with probability of success 1 - epsiv, where epsiv Lt 1 (e.g., 1/n). Then we develop and analyze two algorithms, each using a different decision strategy, and show that both algorithms obtain the same probability of success 1 - epsiv, and in doing so, they require similar upper bounds on the (expected) work. Furthermore, under certain conditions, these upper bounds are asymptotically optimal with respect to our lower bounds

在这项工作中，我们考虑一个分布式系统，由一个主处理器和n个可以执行任务的处理器(工人)的集合组成;工作处理器是不受信任的，可能会进行恶意操作。主人将任务分配给工人执行。每个任务返回一个二进制值，我们希望主机只接受高概率的正确值。此外，我们假设工人提供的服务不是免费的;对于分配给工人的每一项任务，主管都负责一个工作单元。因此，考虑到分配给几个工人的单个任务，我们的目标是让主计算机以尽可能小的工作量(主计算机分配任务的工人数量)以高概率接受任务的正确值。我们探索了两种限制故障处理器数量的方法:(a)我们考虑一个固定的边界f < n/2的最大可能失效的工人数量，以及(b)任何处理器故障的概率p < 1/2(所有处理器故障的概率p，独立于其余处理器)。我们的工作表明，以低的工作量获得高的正确接受概率是可能的。特别是，通过考虑限制恶意工作者数量的两种机制，我们首先显示了所需(预期)工作的最小量的下界，以便任何算法都以成功概率接受正确的值1 - epsiv，其中epsiv Lt 1(例如，1/n)。然后，我们开发和分析了两种算法，每种算法使用不同的决策策略，并表明两种算法获得相同的成功概率1 - epsiv，并且在这样做时，它们需要(期望)工作的上界相似。而且，在一定条件下，这些上界相对于下界是渐近最优的

{"title":"Reliably Executing Tasks in the Presence of Untrusted Entities","authors":"Antonio Fernández, Luis López, Agustín Santos, Chryssis Georgiou","doi":"10.1109/SRDS.2006.40","DOIUrl":"https://doi.org/10.1109/SRDS.2006.40","url":null,"abstract":"In this work we consider a distributed system formed by a master processor and a collection of n processors (workers) that can execute tasks; worker processors are untrusted and might act maliciously. The master assigns tasks to workers to be executed. Each task returns a binary value, and we want the master to accept only correct values with high probability. Furthermore, we assume that the service provided by the workers is not free; for each task that a worker is assigned, the master is charged with a work-unit. Therefore, considering a single task assigned to several workers, our goal is to have the master computer to accept the correct value of the task with high probability, with the smallest possible amount of work (number of workers the master assigns the task). We explore two ways of bounding the number of faulty processors: (a) we consider a fixed bound f < n/2 on the maximum number of workers that may fail, and (b) a probability p < 1/2 of any processor to be faulty (all processors are faulty with probability p, independently of the rest of processors). Our work demonstrates that it is possible to obtain high probability of correct acceptance with low work. In particular, by considering both mechanisms of bounding the number of malicious workers, we first show lower bounds on the minimum amount of (expected) work required, so that any algorithm accepts the correct value with probability of success 1 - epsiv, where epsiv Lt 1 (e.g., 1/n). Then we develop and analyze two algorithms, each using a different decision strategy, and show that both algorithms obtain the same probability of success 1 - epsiv, and in doing so, they require similar upper bounds on the (expected) work. Furthermore, under certain conditions, these upper bounds are asymptotically optimal with respect to our lower bounds","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123572791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Call Availability Prediction in a Telecommunication System: A Data Driven Empirical Approach 电信系统呼叫可用性预测:数据驱动的实证方法

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.12

G. A. Hoffmann, M. Malek

Availability prediction in a telecommunication system plays a crucial role in its management, either by alerting the operator to potential failures or by proactively initiating preventive measures. In this paper, we apply linear (ARMA, multivariate, random walk) and nonlinear (Radial and Universal Basis Functions) regression techniques to recognize system failures and to predict the system's call availability up to 15 minutes in advance. Secondly we introduce a novel nonlinear modeling technique for call availability prediction. We benchmark all five techniques against each other. The applied modeling methods are data driven rather than analytical and can handle large amounts of data. We apply the modeling techniques to real data of a commercial telecommunication platform. The data used for modeling includes: a) time stamped event-based log files; and b) continuously measured system states. Results are given in terms of a) receiver operator characteristics (AUC) for classification into classes of failure and non-failure states and b) as a cost-benefit analysis. Our findings suggest: a) high degree of nonlinearity in the data; b) statistically significant improved forecasting performance and cost-benefit ratio of nonlinear modeling techniques; and finally finding that c) log file data does not contribute to improve model performance with any modeling technique

可用性预测在电信系统的管理中起着至关重要的作用，它可以提醒运营商注意潜在的故障，也可以主动采取预防措施。在本文中，我们应用线性(ARMA，多元，随机漫步)和非线性(径向和通用基函数)回归技术来识别系统故障并提前15分钟预测系统的呼叫可用性。其次，我们介绍了一种新的非线性建模技术用于呼叫可用性预测。我们对这五种技术进行了基准测试。应用的建模方法是数据驱动的，而不是分析的，可以处理大量数据。将建模技术应用于某商业电信平台的实际数据。用于建模的数据包括:a)带有时间戳的基于事件的日志文件;b)连续测量的系统状态。结果以a)接收器操作员特征(AUC)分类为故障和非故障状态，b)作为成本效益分析。我们的研究结果表明:a)数据的高度非线性;B)统计上显著提高了非线性建模技术的预测性能和成本效益比;最后发现c)日志文件数据对任何建模技术的模型性能都没有帮助

{"title":"Call Availability Prediction in a Telecommunication System: A Data Driven Empirical Approach","authors":"G. A. Hoffmann, M. Malek","doi":"10.1109/SRDS.2006.12","DOIUrl":"https://doi.org/10.1109/SRDS.2006.12","url":null,"abstract":"Availability prediction in a telecommunication system plays a crucial role in its management, either by alerting the operator to potential failures or by proactively initiating preventive measures. In this paper, we apply linear (ARMA, multivariate, random walk) and nonlinear (Radial and Universal Basis Functions) regression techniques to recognize system failures and to predict the system's call availability up to 15 minutes in advance. Secondly we introduce a novel nonlinear modeling technique for call availability prediction. We benchmark all five techniques against each other. The applied modeling methods are data driven rather than analytical and can handle large amounts of data. We apply the modeling techniques to real data of a commercial telecommunication platform. The data used for modeling includes: a) time stamped event-based log files; and b) continuously measured system states. Results are given in terms of a) receiver operator characteristics (AUC) for classification into classes of failure and non-failure states and b) as a cost-benefit analysis. Our findings suggest: a) high degree of nonlinearity in the data; b) statistically significant improved forecasting performance and cost-benefit ratio of nonlinear modeling techniques; and finally finding that c) log file data does not contribute to improve model performance with any modeling technique","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124018309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Proactive Resilience Revisited: The Delicate Balance Between Resisting Intrusions and Remaining Available 重新审视主动恢复力:抵抗入侵和保持可用性之间的微妙平衡

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.37

Paulo Sousa, N. Neves, P. Veríssimo, W. Sanders

In a recent paper, we presented proactive resilience as a new approach to proactive recovery, based on architectural hybridization. We showed that, with appropriate assumptions about fault rate, proactive resilience makes it possible to build distributed intrusion-tolerant systems guaranteed not to suffer more than the assumed number of faults during their lifetime. In this paper, we explore the impact of these assumptions in asynchronous systems, and derive conditions that should be met by practical systems in order to guarantee long-lived, i.e., available, intrusion-tolerant operation. Our conclusions are based on analytical and simulation results as implemented in Mobius, and we use the same modeling environment to show that our approach offers higher resilience in comparison with other proactive intrusion-tolerant system models

在最近的一篇论文中，我们提出了一种基于建筑杂交的主动恢复的新方法。我们表明，通过对故障率的适当假设，主动弹性使得构建分布式容错系统成为可能，保证在其生命周期内不会遭受超过假设数量的故障。在本文中，我们探讨了这些假设在异步系统中的影响，并得出了实际系统应该满足的条件，以保证长寿命，即可用的，可容忍入侵的操作。我们的结论是基于在Mobius中实现的分析和仿真结果，我们使用相同的建模环境来表明，与其他主动容忍度高的系统模型相比，我们的方法提供了更高的弹性

引用次数: 27

Reducing the Availability Management Overheads of Federated Content Sharing Systems 减少联邦内容共享系统的可用性管理开销

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

Pub Date : 2006-10-02 DOI: 10.1109/SRDS.2006.39

Christopher Peery, Thu D. Nguyen, Francisco Matias Cuenca-Acuna

We consider the problem of ensuring high data availability in federated content sharing systems. Ideally, such a system would provide high data availability in a device transparent manner so that users are not faced with the time-consuming and error-prone task of managing data replicas across the constituent devices of the system. We propose a novel unified availability model and a decentralized replication algorithm to approximate this ideal. Our availability model addresses three different concerns: availability during connected operation (online), availability during disconnected operation (offline), and availability after permanent disconnection from the federated system (ownership). Our replication algorithm centers around the intuition that devices should selfishly use their local storage to ensure offline and ownership availability for their individual owners. Excess storage, however, is used communally to ensure high online availability for all shared content. Evaluation of an implementation shows that our algorithm rapidly reaches stable and communally desirable configurations when there is sufficient space. Consistent with the fact that devices in a federated system are owned by different users, however, as space becomes highly constrained, the system approaches a non-cooperative configuration where devices only hoard content to serve their individual owners' needs

我们考虑在联邦内容共享系统中确保高数据可用性的问题。理想情况下，这样的系统将以设备透明的方式提供高数据可用性，这样用户就不会面临跨系统组成设备管理数据副本的耗时且容易出错的任务。我们提出了一个新的统一的可用性模型和一个分散的复制算法来接近这个理想。我们的可用性模型解决了三个不同的问题:连接操作期间的可用性(在线)、断开连接操作期间的可用性(离线)以及与联邦系统永久断开连接后的可用性(所有权)。我们的复制算法围绕着这样一种直觉，即设备应该自私地使用其本地存储，以确保其个人所有者的离线和所有权可用性。但是，多余的存储空间通常用于确保所有共享内容的高在线可用性。一个实现的评估表明，当有足够的空间时，我们的算法可以快速达到稳定和共同期望的配置。然而，与联邦系统中的设备由不同用户拥有这一事实相一致的是，随着空间变得高度受限，系统将接近一种非合作配置，在这种配置中，设备仅存储内容以满足其个人所有者的需求

{"title":"Reducing the Availability Management Overheads of Federated Content Sharing Systems","authors":"Christopher Peery, Thu D. Nguyen, Francisco Matias Cuenca-Acuna","doi":"10.1109/SRDS.2006.39","DOIUrl":"https://doi.org/10.1109/SRDS.2006.39","url":null,"abstract":"We consider the problem of ensuring high data availability in federated content sharing systems. Ideally, such a system would provide high data availability in a device transparent manner so that users are not faced with the time-consuming and error-prone task of managing data replicas across the constituent devices of the system. We propose a novel unified availability model and a decentralized replication algorithm to approximate this ideal. Our availability model addresses three different concerns: availability during connected operation (online), availability during disconnected operation (offline), and availability after permanent disconnection from the federated system (ownership). Our replication algorithm centers around the intuition that devices should selfishly use their local storage to ensure offline and ownership availability for their individual owners. Excess storage, however, is used communally to ensure high online availability for all shared content. Evaluation of an implementation shows that our algorithm rapidly reaches stable and communally desirable configurations when there is sufficient space. Consistent with the fact that devices in a federated system are owned by different users, however, as space becomes highly constrained, the system approaches a non-cooperative configuration where devices only hoard content to serve their individual owners' needs","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133335336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀