2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)最新文献

英文中文

Get Your Head Out of the Clouds: The Illusion of Confidentiality & Privacy 把你的头从云里拿出来:保密和隐私的错觉

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00015

V. Urias, W. Stout, Caleb Loverro, B. V. Leeuwen

The cloud has been leveraged for many applications across different industries. Despite its popularity, the cloud technologies are still immature. The security implications of cloud computing also dominate the research space. Many confidentiality-and integrity-based (C-I) security controls concerning data-at-rest and data-in-transit are focused on encryption. In the world where social-media platforms transparently gather data about user behaviors and user interests, the need for user privacy and data protection is of the utmost importance. However, how can a user know that his data is safe, that her data is secure, that his data's integrity is upheld; to be confident that her communications only reach the intended recipients? We propose: they can't. Many threats have been hypothesized in the shared-service arena, with many solutions formulated to avert those threats; however, we illustrate that many technologies and standards supporting C-I controls may be ineffective, not just against the adversarial actors, but also against trusted entities. Service providers and malicious insiders can intercept and decrypt network-and host-based data without any guest or user knowledge.

云已经被用于不同行业的许多应用程序。尽管云技术很受欢迎，但它仍然不成熟。云计算的安全含义也主导了研究领域。关于静止数据和传输数据的许多基于机密性和完整性(C-I)的安全控制都侧重于加密。在社交媒体平台透明地收集用户行为和用户兴趣数据的世界中，对用户隐私和数据保护的需求至关重要。然而，用户如何知道他的数据是安全的，她的数据是安全的，他的数据的完整性得到维护;确保她的信息只到达预定的收件人?我们的建议是:他们不能。在共享服务领域，人们假设了许多威胁，并制定了许多解决方案来避免这些威胁;然而，我们说明了许多支持C-I控制的技术和标准可能是无效的，不仅针对敌对行为者，而且针对受信任的实体。服务提供商和恶意的内部人员可以在客户或用户不知情的情况下拦截和解密基于网络和主机的数据。

{"title":"Get Your Head Out of the Clouds: The Illusion of Confidentiality & Privacy","authors":"V. Urias, W. Stout, Caleb Loverro, B. V. Leeuwen","doi":"10.1109/SC2.2018.00015","DOIUrl":"https://doi.org/10.1109/SC2.2018.00015","url":null,"abstract":"The cloud has been leveraged for many applications across different industries. Despite its popularity, the cloud technologies are still immature. The security implications of cloud computing also dominate the research space. Many confidentiality-and integrity-based (C-I) security controls concerning data-at-rest and data-in-transit are focused on encryption. In the world where social-media platforms transparently gather data about user behaviors and user interests, the need for user privacy and data protection is of the utmost importance. However, how can a user know that his data is safe, that her data is secure, that his data's integrity is upheld; to be confident that her communications only reach the intended recipients? We propose: they can't. Many threats have been hypothesized in the shared-service arena, with many solutions formulated to avert those threats; however, we illustrate that many technologies and standards supporting C-I controls may be ineffective, not just against the adversarial actors, but also against trusted entities. Service providers and malicious insiders can intercept and decrypt network-and host-based data without any guest or user knowledge.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Message from the SC2 2018 General and Program Chairs 2018年SC2总主席和项目主席的讲话

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/sc2.2018.00005

引用次数: 0

Hera Object Storage: A Seamless, Automated Multi-Tiering Solution on Top of OpenStack Swift Hera对象存储:基于OpenStack Swift的无缝、自动化多层解决方案

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00011

R. Hoppli, T. Bohnert, L. Militano

Over the last couple of decades, the demand for storage in the Cloud has grown exponentially. Distributed Cloud storage and object storage for the increasing share of unstructured data, are in high focus in both academic and industrial research activities. At the same time, efficient storage and the corresponding costs are often contrasting parameters raising a trade-off problem for any proposed solution. To this aim, classifying the data in terms of access probability became a hot topic. This paper introduces Hera Object Storage, a storage system built on top of OpenStack Swift that aims at selecting the most appropriate storage tier for any object to be stored. The goal of the multi-tiering storage we propose is to be automated and seamless, guaranteeing the required storage performance at the lowest possible cost. The paper discusses the design challenges, the proposed algorithmic solutions to the scope and, based on a prototype implementation it presents a basic proof-of-concept validation.

在过去的几十年里，对云存储的需求呈指数级增长。分布式云存储和对象存储在非结构化数据中所占的份额越来越大，在学术和工业研究活动中都受到高度关注。同时，有效的存储和相应的成本往往是对立的参数，提出了任何提出的解决方案的权衡问题。为此，根据访问概率对数据进行分类成为一个热门话题。本文介绍了基于OpenStack Swift构建的存储系统Hera Object Storage，该存储系统旨在为任何需要存储的对象选择最合适的存储层。我们提出的多层存储的目标是自动化和无缝的，以尽可能低的成本保证所需的存储性能。本文讨论了设计挑战，提出了针对范围的算法解决方案，并基于原型实现提出了基本的概念验证。

引用次数: 1

Dynamic Scheduling for Seamless Computing 无缝计算的动态调度

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00013

Ludwig Mittermeier, Florian Katenbrink, A. Seitz, Harald Mueller, B. Brügge

The data generated by Internet of Things (IoT) devices is constantly increasing. The use of cloud computing to process this data is associated with network congestion, has high latency and does not provide context awareness. Fog computing has been proposed as a solution to overcome these problems. The heterogeneous, distributed nature of fog and edge nodes requires a system that facilitates the development and deployment of applications and management of nodes in a cluster. Seamless computing is an extension of fog computing that respects mobility and heterogeneity of nodes. In industry scenarios where cost, energy or network latency optimization is preferred at runtime, static deployment is not sufficient. Production systems require availability, fault tolerance and extensibility, but not at the expense of usability. This research examines the requirements of a system for dynamic rescheduling of software components in a distributed, heterogeneous fog computing cluster at runtime. We propose Dynamic Scheduling for Seamless Computing (DYSCO) as a solution and present a concept implementation based on Kubernetes. We describe the configuration of Kubernetes for DYSCO, extend it with a monitoring tool and enhance the scheduler to enable dynamic components rescheduling. We evaluate the requirements of DYSCO with test cases for an industry-specific scenario in a fog computing cluster. It operates a safety-critical application that must immediately react to machine failures and an application that processes employee data for analytics. We show that DYSCO is able to reschedule software components at runtime, while ensuring technology independence, availability, fault tolerance and usability.

物联网(IoT)设备产生的数据不断增加。使用云计算来处理这些数据与网络拥塞、高延迟和不提供上下文感知有关。人们提出了雾计算作为克服这些问题的一种解决方案。雾节点和边缘节点的异构、分布式特性需要一个系统来促进应用程序的开发和部署以及集群中节点的管理。无缝计算是雾计算的扩展，它尊重节点的移动性和异构性。在运行时优先考虑成本、能源或网络延迟优化的行业场景中，静态部署是不够的。生产系统需要可用性、容错性和可扩展性，但不以牺牲可用性为代价。本研究考察了分布式异构雾计算集群在运行时对软件组件进行动态重调度的系统需求。我们提出了动态调度无缝计算(DYSCO)作为解决方案，并提出了一个基于Kubernetes的概念实现。我们描述了用于DYSCO的Kubernetes的配置，使用监视工具对其进行扩展，并增强了调度器以启用动态组件重新调度。我们使用雾计算集群中特定行业场景的测试用例来评估DYSCO的需求。它运行一个必须立即对机器故障做出反应的安全关键应用程序，以及一个处理员工数据进行分析的应用程序。我们表明，DYSCO能够在运行时重新安排软件组件，同时确保技术独立性，可用性，容错性和可用性。

{"title":"Dynamic Scheduling for Seamless Computing","authors":"Ludwig Mittermeier, Florian Katenbrink, A. Seitz, Harald Mueller, B. Brügge","doi":"10.1109/SC2.2018.00013","DOIUrl":"https://doi.org/10.1109/SC2.2018.00013","url":null,"abstract":"The data generated by Internet of Things (IoT) devices is constantly increasing. The use of cloud computing to process this data is associated with network congestion, has high latency and does not provide context awareness. Fog computing has been proposed as a solution to overcome these problems. The heterogeneous, distributed nature of fog and edge nodes requires a system that facilitates the development and deployment of applications and management of nodes in a cluster. Seamless computing is an extension of fog computing that respects mobility and heterogeneity of nodes. In industry scenarios where cost, energy or network latency optimization is preferred at runtime, static deployment is not sufficient. Production systems require availability, fault tolerance and extensibility, but not at the expense of usability. This research examines the requirements of a system for dynamic rescheduling of software components in a distributed, heterogeneous fog computing cluster at runtime. We propose Dynamic Scheduling for Seamless Computing (DYSCO) as a solution and present a concept implementation based on Kubernetes. We describe the configuration of Kubernetes for DYSCO, extend it with a monitoring tool and enhance the scheduler to enable dynamic components rescheduling. We evaluate the requirements of DYSCO with test cases for an industry-specific scenario in a fog computing cluster. It operates a safety-critical application that must immediately react to machine failures and an application that processes employee data for analytics. We show that DYSCO is able to reschedule software components at runtime, while ensuring technology independence, availability, fault tolerance and usability.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128007284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Enabling RETE Algorithm for RDFS Reasoning on Apache Spark 在Apache Spark上启用RETE算法进行RDFS推理

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00028

H. Ju, Sangyoon Oh

Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.

语义网技术已被用于帮助各种软件，包括智能个人助理，通过获取新的数据或通过数据之间的关系来理解知识。然而，由于需要处理大量的数据，目前的语义web方案(如RDFS推理)很难应用于现实世界的数据。在本研究中，我们以并行方式在Apache Spark上设计并启用了使用RETE算法的RDFS推理。此外，我们还应用已有研究中的规则序列优化排序来提高处理性能。从实证实验结果来看，我们验证了我们设计的实现具有较强的可扩展性。但是，目前使用Spark提供不同功能来重复数据删除的naïve方法应该得到改进，以获得更好的处理性能。在今后的研究中，我们将进一步研究寻找新的重复数据删除方法。

引用次数: 2

Accelerating the Computation of Multi-Objectives Scheduling Solutions for Cloud Computing 加速云计算多目标调度方案的计算

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00014

C. Cérin, Tarek Menouer, M. Lebbah

This paper presents two practical Large Scale Multi-Objectives Scheduling (LSMOS) strategies, proposed for Cloud Computing environments. The goal is to address the problems of companies that manage a large cloud infrastructure with thousands of nodes, and would like to optimize the scheduling of several requests submitted online by users. In our context, requests submitted by users are configured according to multi-objectives criteria, as the number of used CPUs and the used memory size, to take an example. The novelty of our strategies is to select effectively, from a large set of nodes forming the Cloud Computing platform, a node that execute the user request such that this node has a good compromise among a large set of multi-objectives criteria. In this paper, first we show the limit, in terms of performance, of exact solutions. Second, we introduce approximate algorithms in order to deal with high dimensional problems in terms of nodes number and criteria number. The proposed two scheduling strategies are based on exact Kung multi-objectives decision algorithm and k-means clustering algorithm or LSH hashing (random projection based) algorithm. The experiments of our new strategies demonstrate the potential of our approach under different scenarios.

本文针对云计算环境，提出了两种实用的大规模多目标调度(LSMOS)策略。目标是解决那些管理拥有数千个节点的大型云基础设施的公司的问题，并希望优化用户在线提交的多个请求的调度。在我们的上下文中，用户提交的请求是根据多目标标准配置的，例如使用的cpu数量和使用的内存大小。我们策略的新颖之处在于，从形成云计算平台的大量节点中有效地选择一个执行用户请求的节点，使该节点在大量多目标标准中具有良好的折衷性。在本文中，我们首先从性能的角度证明了精确解的极限。其次，从节点数和准则数两个方面引入近似算法来处理高维问题。提出的两种调度策略分别基于精确Kung多目标决策算法和k-means聚类算法或LSH哈希算法。我们的新策略的实验证明了我们的方法在不同情况下的潜力。

{"title":"Accelerating the Computation of Multi-Objectives Scheduling Solutions for Cloud Computing","authors":"C. Cérin, Tarek Menouer, M. Lebbah","doi":"10.1109/SC2.2018.00014","DOIUrl":"https://doi.org/10.1109/SC2.2018.00014","url":null,"abstract":"This paper presents two practical Large Scale Multi-Objectives Scheduling (LSMOS) strategies, proposed for Cloud Computing environments. The goal is to address the problems of companies that manage a large cloud infrastructure with thousands of nodes, and would like to optimize the scheduling of several requests submitted online by users. In our context, requests submitted by users are configured according to multi-objectives criteria, as the number of used CPUs and the used memory size, to take an example. The novelty of our strategies is to select effectively, from a large set of nodes forming the Cloud Computing platform, a node that execute the user request such that this node has a good compromise among a large set of multi-objectives criteria. In this paper, first we show the limit, in terms of performance, of exact solutions. Second, we introduce approximate algorithms in order to deal with high dimensional problems in terms of nodes number and criteria number. The proposed two scheduling strategies are based on exact Kung multi-objectives decision algorithm and k-means clustering algorithm or LSH hashing (random projection based) algorithm. The experiments of our new strategies demonstrate the potential of our approach under different scenarios.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123248056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Security Proxy to Cloud Storage Backends Based on an Efficient Wildcard Searchable Encryption 基于高效通配符可搜索加密的云存储后端安全代理

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00026

Shen-Ming Chung, Ming-Der Shieh, T. Chiueh

Cloud storage backends such as Amazon S3 are a potential storage solution to enterprises. However, to couple enterprises with these backends, at least two problems must be solved: first, how to make these semi-trusted backends as secure as on-premises storage; and second, how to selectively retrieve files as easy as on-premises storage. A security proxy can address both the problems by building a local index from keywords in files before encrypting and uploading files to these backends. But, if the local index is built in plaintext, file content is still vulnerable to local malicious staff. Searchable Encryption (SE) can get rid of this vulnerability by making index into ciphertext; however, its known constructions often require modifications to index database, and, to support wildcard queries, they are not efficient at all. In this paper, we present a security proxy that, based on our wildcard SE construction, can securely and efficiently couple enterprises with these backends. In particular, since our SE construction can work directly with existing database systems, it incurs only a little overhead, and when needed, permits the security proxy to run with constantly small storage footprint by readily out-sourcing all built indices to existing cloud databases.

云存储后端(如Amazon S3)是企业的潜在存储解决方案。然而，要将企业与这些后端结合起来，至少必须解决两个问题:首先，如何使这些半可信的后端与本地存储一样安全;其次，如何有选择地检索文件，就像本地存储一样简单。安全代理可以通过在加密和上传文件到这些后端之前根据文件中的关键字构建本地索引来解决这两个问题。但是，如果以明文形式构建本地索引，则文件内容仍然容易受到本地恶意人员的攻击。可搜索加密(seable Encryption, SE)可以通过将索引转化为密文来解决这一问题;然而，它的已知结构通常需要修改索引数据库，并且为了支持通配符查询，它们的效率很低。在本文中，我们提出了一个安全代理，该代理基于我们的通配符SE结构，可以安全有效地将企业与这些后端连接起来。特别是，由于我们的SE构造可以直接与现有的数据库系统一起工作，因此它只会产生很少的开销，并且在需要时，通过将所有构建的索引外包给现有的云数据库，允许安全代理以持续较小的存储占用运行。

{"title":"A Security Proxy to Cloud Storage Backends Based on an Efficient Wildcard Searchable Encryption","authors":"Shen-Ming Chung, Ming-Der Shieh, T. Chiueh","doi":"10.1109/SC2.2018.00026","DOIUrl":"https://doi.org/10.1109/SC2.2018.00026","url":null,"abstract":"Cloud storage backends such as Amazon S3 are a potential storage solution to enterprises. However, to couple enterprises with these backends, at least two problems must be solved: first, how to make these semi-trusted backends as secure as on-premises storage; and second, how to selectively retrieve files as easy as on-premises storage. A security proxy can address both the problems by building a local index from keywords in files before encrypting and uploading files to these backends. But, if the local index is built in plaintext, file content is still vulnerable to local malicious staff. Searchable Encryption (SE) can get rid of this vulnerability by making index into ciphertext; however, its known constructions often require modifications to index database, and, to support wildcard queries, they are not efficient at all. In this paper, we present a security proxy that, based on our wildcard SE construction, can securely and efficiently couple enterprises with these backends. In particular, since our SE construction can work directly with existing database systems, it incurs only a little overhead, and when needed, permits the security proxy to run with constantly small storage footprint by readily out-sourcing all built indices to existing cloud databases.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"23 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126225478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SC2 2018 Organizing Committee SC2 2018组委会

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/sc2.2018.00006

引用次数: 0

A Balanced Partitioning Mechanism Using Collapsed-Condensed Trie in MapReduce MapReduce中基于折叠压缩树的均衡分区机制

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00020

Hsing-Lung Chen, Syu-Huan Chen

The MapReduce has emerged as an efficient platform for coping with big data. It achieves this goal by decoupling the data and then distributing the workloads to multiple reducers for processing in a fully parallel manner. Zipf's law asserts that, for many types of data studied in the physical and social sciences, the frequency of any event is inversely proportional to its rank in the frequency table, i.e. the key distribution is skewed. However, the hash function of MapReduce usually generates the unbalanced workloads to multiple reducers for the skewed data. The unbalanced workloads to multiple reducers lead to degrading the performance of MapReduce significantly, because the overall running time of a map-reduce cycle is determined by the longest running reducer. Thus, it is an important issue to develop a balanced partitioning algorithm which partitions the workloads evenly for all the reducers. This paper proposes a balanced partitioning mechanism with collapsed-condensed trie in MapReduce, which evenly distributes the workloads to the reducers. A collapsed-condensed trie is introduced for capturing the data statistics authentically, with which it requires a reasonable amount of memory usage and incurs a small running overhead. Then, we propose a quasi-optimal packing algorithm to assign sub-partitions to the reducers evenly, resulting in reducing the total execution time. The experiments using Inverted Indexing on the real-world datasets are conducted to evaluate the performance of our proposed partitioning mechanism.

MapReduce已经成为处理大数据的高效平台。它通过解耦数据，然后将工作负载分配给多个reducer，以便以完全并行的方式进行处理，从而实现这一目标。齐夫定律断言，对于物理科学和社会科学中研究的许多类型的数据，任何事件的频率与其在频率表中的排名成反比，即关键分布是倾斜的。然而，MapReduce的哈希函数通常会对倾斜数据产生多个reducer的不平衡工作负载。由于map-reduce周期的总体运行时间由运行时间最长的reducer决定，因此多个reducer的负载不均衡会导致MapReduce的性能显著下降。因此，开发一种均衡的分区算法，为所有的reducer均匀地划分工作负载是一个重要的问题。本文在MapReduce中提出了一种基于折叠压缩trie的均衡分区机制，将工作负载均匀地分配给reducer。为了真实地捕获数据统计信息，引入了一个折叠压缩的trie，它需要合理的内存使用量，并且产生很小的运行开销。然后，我们提出了一种准最优打包算法，将子分区均匀地分配给reducer，从而减少了总执行时间。在实际数据集上使用倒排索引进行了实验，以评估我们提出的分区机制的性能。

{"title":"A Balanced Partitioning Mechanism Using Collapsed-Condensed Trie in MapReduce","authors":"Hsing-Lung Chen, Syu-Huan Chen","doi":"10.1109/SC2.2018.00020","DOIUrl":"https://doi.org/10.1109/SC2.2018.00020","url":null,"abstract":"The MapReduce has emerged as an efficient platform for coping with big data. It achieves this goal by decoupling the data and then distributing the workloads to multiple reducers for processing in a fully parallel manner. Zipf's law asserts that, for many types of data studied in the physical and social sciences, the frequency of any event is inversely proportional to its rank in the frequency table, i.e. the key distribution is skewed. However, the hash function of MapReduce usually generates the unbalanced workloads to multiple reducers for the skewed data. The unbalanced workloads to multiple reducers lead to degrading the performance of MapReduce significantly, because the overall running time of a map-reduce cycle is determined by the longest running reducer. Thus, it is an important issue to develop a balanced partitioning algorithm which partitions the workloads evenly for all the reducers. This paper proposes a balanced partitioning mechanism with collapsed-condensed trie in MapReduce, which evenly distributes the workloads to the reducers. A collapsed-condensed trie is introduced for capturing the data statistics authentically, with which it requires a reasonable amount of memory usage and incurs a small running overhead. Then, we propose a quasi-optimal packing algorithm to assign sub-partitions to the reducers evenly, resulting in reducing the total execution time. The experiments using Inverted Indexing on the real-world datasets are conducted to evaluate the performance of our proposed partitioning mechanism.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124523130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Novel Automated Tiered Storage Architecture for Achieving Both Cost Saving and QoE 一种实现成本节约和QoE的新型自动分层存储体系结构

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

Pub Date : 2018-11-01 DOI: 10.1109/SC2.2018.00012

Ryo Irie, Shuuichirou Murata, Ying-Feng Hsu, Morito Matsuoka

With the exponential growth of data from ICT equipment and the continued development of low-cost storage technology, the scale and amount of data are continually increasing in many areas and moving throughout the cloud. However, most of them are infrequently accessed. Data temperature describes the frequency of data access: hot storage is dedicated to storing frequently accessed data, while cold storage is designed for infrequently accessed data. In this paper, we propose and implement an architecture of an automated tiered storage system that optimizes data allocation in data centers. Our proposed approach brings mutual benefits to both service providers and end users. Users do not need to consider which storage media they want to save, and access and service providers do not need to analyze data access or manually classify data. By successfully predicting infrequently accessed data and moving them to the cold storage, we obtain significant cost saving. While having the benefit of storage cost savings, we also ensure a quality of experience through the correctness of the predicted hot data. The operational strategy varies among cloud storage service providers, and as a result, we characterize different scenarios and provide customized optimal solutions.

随着来自ICT设备的数据呈指数级增长，以及低成本存储技术的不断发展，数据的规模和数量在许多领域不断增加，并在整个云中移动。然而，它们中的大多数很少被访问。数据温度描述了数据访问的频率:热存储专门用于存储访问频率高的数据，冷存储专门用于存储访问频率低的数据。在本文中，我们提出并实现了一个自动分级存储系统的架构，以优化数据中心的数据分配。我们建议的方法对服务提供者和最终用户都有利。用户不需要考虑他们想要保存哪种存储介质，访问和服务提供商不需要分析数据访问或手动分类数据。通过成功地预测不经常访问的数据并将它们移动到冷存储中，我们获得了显著的成本节约。在节省存储成本的同时，我们还通过预测热数据的正确性来确保体验的质量。云存储服务提供商的运营策略各不相同，因此，我们描述了不同的场景并提供定制的最佳解决方案。

{"title":"A Novel Automated Tiered Storage Architecture for Achieving Both Cost Saving and QoE","authors":"Ryo Irie, Shuuichirou Murata, Ying-Feng Hsu, Morito Matsuoka","doi":"10.1109/SC2.2018.00012","DOIUrl":"https://doi.org/10.1109/SC2.2018.00012","url":null,"abstract":"With the exponential growth of data from ICT equipment and the continued development of low-cost storage technology, the scale and amount of data are continually increasing in many areas and moving throughout the cloud. However, most of them are infrequently accessed. Data temperature describes the frequency of data access: hot storage is dedicated to storing frequently accessed data, while cold storage is designed for infrequently accessed data. In this paper, we propose and implement an architecture of an automated tiered storage system that optimizes data allocation in data centers. Our proposed approach brings mutual benefits to both service providers and end users. Users do not need to consider which storage media they want to save, and access and service providers do not need to analyze data access or manually classify data. By successfully predicting infrequently accessed data and moving them to the cold storage, we obtain significant cost saving. While having the benefit of storage cost savings, we also ensure a quality of experience through the correctness of the predicted hot data. The operational strategy varies among cloud storage service providers, and as a result, we characterize different scenarios and provide customized optimal solutions.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124557164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀