2013 IEEE 5th International Conference on Cloud Computing Technology and Science最新文献

英文中文

Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment 经济实惠和节能的云计算集群:Bolzano树莓派云集群实验

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.121

P. Abrahamsson, S. Helmer, Nattakarn Phaphoom, L. Nicolodi, Nick Preda, Lorenzo Miori, Matteo Angriman, Juha Rikkilä, Xiaofeng Wang, Karim Hamily, Sara Bugoloni

We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. On the one hand, a single Raspberry Pi can be purchased cheaply and has a low power consumption, which makes it possible to create an affordable and energy-efficient cluster. On the other hand, it lacks in computing power, which makes it difficult to run computationally intensive software on it. Nevertheless, by combining a large number of Raspberries into a cluster, this drawback can be (partially) offset. Here we report on the first important steps of creating our cluster: how to set up and configure the hardware and the system software, and how to monitor and maintain the system. We also discuss potential use cases for our cluster, the two most important being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.

我们介绍了正在进行的构建一个由300个节点组成的树莓派集群的工作。这种单板计算机的独特特性带来了一些挑战，但也提供了一些有趣的机会。一方面，单个树莓派可以便宜地购买，并且具有低功耗，这使得创建负担得起且节能的集群成为可能。另一方面，它缺乏计算能力，这使得在其上运行计算密集型软件变得困难。然而，通过将大量的覆盆子组合到集群中，可以(部分地)抵消这个缺点。在这里，我们将介绍创建集群的第一个重要步骤:如何设置和配置硬件和系统软件，以及如何监视和维护系统。我们还讨论了集群的潜在用例，其中最重要的两个用例是用于云计算研究的廉价绿色测试平台，以及用于在不利环境中运行的健壮的移动数据中心。

{"title":"Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment","authors":"P. Abrahamsson, S. Helmer, Nattakarn Phaphoom, L. Nicolodi, Nick Preda, Lorenzo Miori, Matteo Angriman, Juha Rikkilä, Xiaofeng Wang, Karim Hamily, Sara Bugoloni","doi":"10.1109/CloudCom.2013.121","DOIUrl":"https://doi.org/10.1109/CloudCom.2013.121","url":null,"abstract":"We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. On the one hand, a single Raspberry Pi can be purchased cheaply and has a low power consumption, which makes it possible to create an affordable and energy-efficient cluster. On the other hand, it lacks in computing power, which makes it difficult to run computationally intensive software on it. Nevertheless, by combining a large number of Raspberries into a cluster, this drawback can be (partially) offset. Here we report on the first important steps of creating our cluster: how to set up and configure the hardware and the system software, and how to monitor and maintain the system. We also discuss potential use cases for our cluster, the two most important being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123382280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106

VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters VOLUME:在商品集群上启用大规模内存计算

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.15

Zhiqiang Ma, David Ke Hong, Lin Gu

Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.

传统的云计算技术，如MapReduce，使用文件系统作为数据存储和共享的全系统底层。分布式文件系统提供全局名称空间并持久地存储数据，但它也带来了巨大的开销。最近的几个系统使用DRAM来存储数据，极大地提高了云计算系统的性能。然而，我们自己的经验和相关工作都表明，简单地用分布式DRAM代替文件系统并不能为数据中心环境中的数据存储和处理提供坚实可行的基础，而且这种系统的容量受到集群中物理内存数量的限制。为了克服这一挑战，我们构建了卷(虚拟在线统一内存环境)，一种分布式虚拟内存来统一许多计算节点上的物理内存和磁盘资源，形成一个系统范围的数据基板。新的基板提供了基于通用内存的抽象，利用系统中的DRAM来加速计算，并且对程序员透明，通过将数据交换到磁盘和远程服务器来扩展系统以处理大型数据集。评估结果表明，VOLUME比Hadoop/HDFS快得多，并且在邻接表工作负载上提供6-11倍的加速。在内存排序方面，VOLUME比Hadoop/HDFS和Spark/RDD都要快。对于kmeans集群，VOLUME在TH-1/GZ超级计算机上线性扩展到160个计算节点。

{"title":"VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters","authors":"Zhiqiang Ma, David Ke Hong, Lin Gu","doi":"10.1109/CloudCom.2013.15","DOIUrl":"https://doi.org/10.1109/CloudCom.2013.15","url":null,"abstract":"Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116488570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Enabling Virtual, Cloud-Based Sensors in Assisted Living Environments 在辅助生活环境中启用虚拟、基于云的传感器

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.74

Martin Franke, A. Wuttig, T. Schlegel

In this work, we combine the field of cloud computing with assisted living to utilize an improved activity classification. The proposed system addresses the challenge of enabling heterogeneous, cloud-based sensors in small home environments. With a semantic and model-based approach, we seamlessly integrate Web services as virtual sensors and, therefore, increasing the accuracy of human activity classifiers. We solve the most important compelling issue of interoperability, by using generated wrapper classes and semantically unify all heterogeneous sensor events within the system. Our approach is evaluated with a test installation by means of the requirements for Sensor Web Infrastructures combined with the one for assisted living environments.

在这项工作中，我们将云计算领域与辅助生活相结合，以利用改进的活动分类。提出的系统解决了在小型家庭环境中启用异构、基于云的传感器的挑战。通过语义和基于模型的方法，我们无缝地将Web服务集成为虚拟传感器，从而提高了人类活动分类器的准确性。通过使用生成的包装器类和在语义上统一系统内所有异构传感器事件，我们解决了互操作性最重要的引人注目的问题。我们的方法通过传感器网络基础设施的要求与辅助生活环境的要求相结合的测试安装进行了评估。

引用次数: 5

Varanus: In Situ Monitoring for Large Scale Cloud Systems 大型云系统的现场监测

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.164

Jonathan Stuart Ward, A. Barker

Monitoring is an essential aspect of maintaining and developing computer systems which increases in difficulty proportional to the size of the system. The need for robust monitoring tools has become more evident with the advent of cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to deploy vast numbers of virtual machines as part of dynamic and transient architectures. Current monitoring solutions, including many of those in the open-source domain, rely on outdated concepts including manual configuration and centralised data collection and adapt poorly to membership churn. In this paper we propose the development of a cloud monitoring system to provide scalable and robust lookup, data collection and analysis services for large-scale cloud systems. In lieu of centrally managed monitoring we propose a multi-tier architecture using a layered gossip protocol to aggregate monitoring information and facilitate lookup, information collection and the identification of redundant capacity. This allows for a resource aware data collection and storage architecture that operates over the system being monitored. This in turn enables monitoring to be done in situ without the need for significant additional infrastructure to facilitate monitoring services. We evaluate this approach against alternative monitoring paradigms and demonstrate how our solution is well adapted to usage in a cloud-computing context.

监控是维护和开发计算机系统的一个重要方面，其难度与系统的规模成正比。随着云计算的出现，对强大的监控工具的需求变得更加明显。基础设施即服务(IaaS)云允许最终用户部署大量虚拟机，作为动态和瞬态架构的一部分。当前的监控解决方案，包括许多开源领域的解决方案，依赖于过时的概念，包括手动配置和集中数据收集，并且难以适应会员流失。在本文中，我们建议开发一个云监控系统，为大型云系统提供可扩展和健壮的查找、数据收集和分析服务。为了代替集中管理的监控，我们提出了一种多层体系结构，使用分层八卦协议来聚合监控信息，并方便查找、信息收集和冗余容量的识别。这允许在被监视的系统上运行资源感知的数据收集和存储体系结构。这反过来又使监测能够在现场进行，而不需要大量额外的基础设施来促进监测服务。我们针对其他监控范例评估了这种方法，并演示了我们的解决方案如何很好地适应云计算上下文中的使用。

{"title":"Varanus: In Situ Monitoring for Large Scale Cloud Systems","authors":"Jonathan Stuart Ward, A. Barker","doi":"10.1109/CloudCom.2013.164","DOIUrl":"https://doi.org/10.1109/CloudCom.2013.164","url":null,"abstract":"Monitoring is an essential aspect of maintaining and developing computer systems which increases in difficulty proportional to the size of the system. The need for robust monitoring tools has become more evident with the advent of cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to deploy vast numbers of virtual machines as part of dynamic and transient architectures. Current monitoring solutions, including many of those in the open-source domain, rely on outdated concepts including manual configuration and centralised data collection and adapt poorly to membership churn. In this paper we propose the development of a cloud monitoring system to provide scalable and robust lookup, data collection and analysis services for large-scale cloud systems. In lieu of centrally managed monitoring we propose a multi-tier architecture using a layered gossip protocol to aggregate monitoring information and facilitate lookup, information collection and the identification of redundant capacity. This allows for a resource aware data collection and storage architecture that operates over the system being monitored. This in turn enables monitoring to be done in situ without the need for significant additional infrastructure to facilitate monitoring services. We evaluate this approach against alternative monitoring paradigms and demonstrate how our solution is well adapted to usage in a cloud-computing context.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132322897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Built Heritage Digitization: Opportunities Afforded by Emerging Cloud Based Applications 建筑遗产数字化:新兴的基于云的应用程序提供的机会

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.109

Jonathan Scott, R. Laing, Graeme Hogg

This paper concerns the use of cloud-based photogrammetry to collect and process 3D information pertaining to the built heritage. Although modern surveying technologies such as laser scanning are available, their application may be prohibited by cost and the need for specialist knowledge in all but the largest projects. The paper uses a specific case study, that of a small vernacular building, to illustrate the manner in which affordable and cloud based photogrammetry can be used to record buildings. Much previous work has concentrated on the use of mobile cloud based 3D surveying technology to record small objects, yet this paper illustrates an application within the architectural heritage. Recording the intact built heritage in this way provides a rich data set, and one which can be readily incorporated within mainstream building information modelling packages.

本文涉及使用基于云的摄影测量来收集和处理有关建筑遗产的三维信息。虽然激光扫描等现代测量技术是可用的，但由于成本和除大型项目外所有项目都需要专业知识，这些技术的应用可能受到限制。本文使用了一个特定的案例研究，一个小型的乡土建筑，来说明如何使用经济实惠的基于云的摄影测量来记录建筑。以前的许多工作都集中在使用基于移动云的3D测量技术来记录小物体，而本文则说明了在建筑遗产中的应用。以这种方式记录完整的建筑遗产提供了丰富的数据集，并且可以很容易地纳入主流建筑信息建模包。

引用次数: 7

Return on Security Investment for Cloud Platforms 云平台的安全投资回报

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.115

Nikolaos Tsalis, M. Theoharidou, D. Gritzalis

Cloud migration is a complex decision because of the multiple parameters that contribute for or against it (e.g. available budget, costs, performance, etc.). One of these parameters is information security and the investment required in order to ensure it. A potential client needs to evaluate various deployment options and Cloud Service Providers (CSP). This paper proposes a set of metrics focused on the assessment of security controls of a cloud deployment, in terms of cost and mitigation. Such an approach can support the client to decide whether she selects to deploy part of her services, data or infrastructure to a CSP, or not.

云迁移是一个复杂的决策，因为支持或反对云迁移的参数很多(例如可用预算、成本、性能等)。其中一个参数是信息安全和确保信息安全所需的投资。潜在客户需要评估各种部署选项和云服务提供商(CSP)。本文提出了一组指标，侧重于评估云部署的安全控制，包括成本和缓解。这种方法可以支持客户决定是否选择将其部分服务、数据或基础设施部署到CSP。

引用次数: 18

Appliance Management for Federated Cloud Environments 联邦云环境的设备管理

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.38

M. Airaj, Christophe Blanchet, S. Kenny, C. Loomis

Cloud infrastructures provide compelling features for scientific and engineering applications. Federated clouds additionally promise improved scalability via access to a larger pool of resources and improved service availability through geographically distributed redundant servers. Effective use of federated clouds requires the creation of portable appliances and consistent appliance management techniques. The Stratus Lab Marketplace, a platform-agnostic appliance registry, facilitates appliance management in a federated environment. This paper describes the Marketplace design goals, implementation, and security concerns. It also covers the planned improvements based on our experience of running this service in production for more than two years.

云基础设施为科学和工程应用程序提供了引人注目的特性。联邦云还承诺通过访问更大的资源池提高可伸缩性，并通过地理上分布的冗余服务器提高服务可用性。有效地使用联邦云需要创建便携式设备和一致的设备管理技术。Stratus Lab Marketplace是一个与平台无关的设备注册中心，它促进了联邦环境中的设备管理。本文描述了市场的设计目标、实现和安全问题。它还涵盖了基于我们在生产环境中运行该服务两年多的经验而计划的改进。

引用次数: 1

The Who, What, Why, and How of High Performance Computing in the Cloud 云中的高性能计算是谁、是什么、为什么以及如何进行的

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CLOUDCOM.2013.47

Abhishek K. Gupta, L. Kalé, F. Gioachin, Verdi March, Chun Hui Suen, Bu-Sung Lee, P. Faraboschi, R. Kaufmann, D. Milojicic

Cloud computing is emerging as an alternative to supercomputers for some of the high-performance computing (HPC) applications that do not require a fully dedicated machine. With cloud as an additional deployment option, HPC users are faced with the challenges of dealing with highly heterogeneous resources, where the variability spans across a wide range of processor configurations, interconnections, virtualization environments, and pricing rates and models. In this paper, we take a holistic viewpoint to answer the question - why and who should choose cloud for HPC, for what applications, and how should cloud be used for HPC? To this end, we perform a comprehensive performance evaluation and analysis of a set of benchmarks and complex HPC applications on a range of platforms, varying from supercomputers to clouds. Further, we demonstrate HPC performance improvements in cloud using alternative lightweight virtualization mechanisms - thin VMs and OS-level containers, and hyper visor- and application-level CPU affinity. Next, we analyze the economic aspects and business models for HPC in clouds. We believe that is an important area that has not been sufficiently addressed by past research. Overall results indicate that current public clouds are cost-effective only at small scale for the chosen HPC applications, when considered in isolation, but can complement supercomputers using business models such as cloud burst and application-aware mapping.

对于一些不需要完全专用计算机的高性能计算(HPC)应用程序，云计算正在成为超级计算机的替代方案。随着云作为一个额外的部署选项，HPC用户面临着处理高度异构资源的挑战，其中的可变性跨越了广泛的处理器配置、互连、虚拟化环境以及定价率和模型。在本文中，我们将从整体的角度来回答以下问题:为什么应该选择云计算，谁应该选择云计算，哪些应用程序应该使用云计算，以及如何将云计算用于HPC?为此，我们在一系列平台(从超级计算机到云)上对一系列基准测试和复杂的HPC应用程序进行了全面的性能评估和分析。此外，我们还演示了在云中使用可选的轻量级虚拟化机制(瘦vm和操作系统级容器，以及超级面罩和应用程序级CPU亲和性)对HPC性能的改进。接下来，我们将分析云计算中HPC的经济方面和商业模式。我们认为这是一个重要的领域，过去的研究没有充分解决。总体结果表明，对于所选的高性能计算应用程序，当前的公共云仅在小规模上具有成本效益，如果单独考虑的话，但可以使用云爆发和应用感知映射等商业模式来补充超级计算机。

{"title":"The Who, What, Why, and How of High Performance Computing in the Cloud","authors":"Abhishek K. Gupta, L. Kalé, F. Gioachin, Verdi March, Chun Hui Suen, Bu-Sung Lee, P. Faraboschi, R. Kaufmann, D. Milojicic","doi":"10.1109/CLOUDCOM.2013.47","DOIUrl":"https://doi.org/10.1109/CLOUDCOM.2013.47","url":null,"abstract":"Cloud computing is emerging as an alternative to supercomputers for some of the high-performance computing (HPC) applications that do not require a fully dedicated machine. With cloud as an additional deployment option, HPC users are faced with the challenges of dealing with highly heterogeneous resources, where the variability spans across a wide range of processor configurations, interconnections, virtualization environments, and pricing rates and models. In this paper, we take a holistic viewpoint to answer the question - why and who should choose cloud for HPC, for what applications, and how should cloud be used for HPC? To this end, we perform a comprehensive performance evaluation and analysis of a set of benchmarks and complex HPC applications on a range of platforms, varying from supercomputers to clouds. Further, we demonstrate HPC performance improvements in cloud using alternative lightweight virtualization mechanisms - thin VMs and OS-level containers, and hyper visor- and application-level CPU affinity. Next, we analyze the economic aspects and business models for HPC in clouds. We believe that is an important area that has not been sufficiently addressed by past research. Overall results indicate that current public clouds are cost-effective only at small scale for the chosen HPC applications, when considered in isolation, but can complement supercomputers using business models such as cloud burst and application-aware mapping.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128902317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Automatic Fault Diagnosis in Cloud Infrastructure 云基础设施故障自动诊断

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CloudCom.2013.68

Qian Zhu, Teresa Tung, Qing Xie

With cloud computing, a cycle of fault diagnosis and recovery becomes the norm. There is a large amount of monitoring data and log events available, but it is hard to figure out which events or metrics are critical in fault diagnosis. Other approaches model faults as a deviation from normal behaviors, and thus are less applicable in cloud where changes in the environment may impact what is considered normal. In this work, we propose an adaptive and flexible fault diagnosis framework to automatically identify the key fault indicators and detect fault patterns. Leveraging ideas from social media, we represent the hierarchical relationships among metrics and events as well as how they relate to faults. We apply the EdgeRank algorithm to decide the key events that contribute to a fault. Our approach works across different environments to detect the potential faults. We evaluated our framework using a cloud-based enterprise system using a list of injected faults that vary from environmental (e.g. virtual machine or network) to application degradation. We considered both private and public clouds. Our solution achieves over 90% detection accuracy with modest overhead. A comparison of our approach shows it is more accurate than alternative approaches in the literature.

有了云计算，故障诊断和恢复的循环成为常态。有大量的监视数据和日志事件可用，但是很难确定哪些事件或度量在故障诊断中是关键的。其他方法将故障建模为对正常行为的偏离，因此不太适用于云中，因为环境的变化可能会影响被认为是正常的。在这项工作中，我们提出了一个自适应的、灵活的故障诊断框架来自动识别关键故障指标和检测故障模式。利用来自社交媒体的想法，我们代表了指标和事件之间的层次关系，以及它们与错误的关系。我们应用EdgeRank算法来确定导致故障的关键事件。我们的方法适用于不同的环境，以检测潜在的故障。我们使用基于云的企业系统来评估我们的框架，使用注入故障列表，这些故障从环境(例如虚拟机或网络)到应用程序退化都有所不同。我们考虑了私有云和公共云。我们的解决方案以适度的开销实现了超过90%的检测精度。我们的方法的比较表明，它比文献中的替代方法更准确。

{"title":"Automatic Fault Diagnosis in Cloud Infrastructure","authors":"Qian Zhu, Teresa Tung, Qing Xie","doi":"10.1109/CloudCom.2013.68","DOIUrl":"https://doi.org/10.1109/CloudCom.2013.68","url":null,"abstract":"With cloud computing, a cycle of fault diagnosis and recovery becomes the norm. There is a large amount of monitoring data and log events available, but it is hard to figure out which events or metrics are critical in fault diagnosis. Other approaches model faults as a deviation from normal behaviors, and thus are less applicable in cloud where changes in the environment may impact what is considered normal. In this work, we propose an adaptive and flexible fault diagnosis framework to automatically identify the key fault indicators and detect fault patterns. Leveraging ideas from social media, we represent the hierarchical relationships among metrics and events as well as how they relate to faults. We apply the EdgeRank algorithm to decide the key events that contribute to a fault. Our approach works across different environments to detect the potential faults. We evaluated our framework using a cloud-based enterprise system using a list of injected faults that vary from environmental (e.g. virtual machine or network) to application degradation. We considered both private and public clouds. Our solution achieves over 90% detection accuracy with modest overhead. A comparison of our approach shows it is more accurate than alternative approaches in the literature.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131331358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

MELA: Monitoring and Analyzing Elasticity of Cloud Services MELA:监控和分析云服务的弹性

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

Pub Date : 2013-12-02 DOI: 10.1109/CLOUDCOM.2013.18

D. Moldovan, G. Copil, Hong Linh Truong, S. Dustdar

Cloud computing has enabled a wide array of applications to be exposed as elastic cloud services. While the number of such services has rapidly increased, there is a lack of techniques for supporting cross-layered multi-level monitoring and analysis of elastic service behavior. In this paper we introduce novel concepts, namely elasticity space and elasticity pathway, for understanding elasticity of cloud services, and techniques for monitoring and evaluating them. We present MELA, a customizable framework, which enables service providers and developers to analyze cross-layered, multi-level elasticity of cloud services, from the whole cloud service to service units, based on service structure dependencies. Besides support for real-time elasticity analysis of cloud service behavior, MELA provides several customizable features for extracting functions and patterns that characterize that behavior. To illustrate the usefulness of MELA, we conduct several experiments with a realistic data-as-a-service in an M2M cloud platform.

云计算使得大量应用程序可以作为弹性云服务公开。虽然此类服务的数量迅速增加，但缺乏支持跨层多级监控和分析弹性服务行为的技术。本文介绍了理解云服务弹性的新概念，即弹性空间和弹性路径，以及监测和评估云服务弹性的技术。我们提出了一个可定制的框架MELA，它使服务提供商和开发人员能够基于服务结构依赖关系分析云服务的跨层、多层次弹性，从整个云服务到服务单元。除了支持云服务行为的实时弹性分析外，MELA还提供了几个可定制的特性，用于提取表征该行为的函数和模式。为了说明MELA的有用性，我们在M2M云平台中使用实际的数据即服务进行了几个实验。

引用次数: 44

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀