Storage characterization for unstructured data in online services applications

2009 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2009-10-04 DOI:10.1109/IISWC.2009.5306786

S. Sankar, Kushagra Vaid

{"title":"Storage characterization for unstructured data in online services applications","authors":"S. Sankar, Kushagra Vaid","doi":"10.1109/IISWC.2009.5306786","DOIUrl":null,"url":null,"abstract":"Mega datacenters hosting large scale web services have unique workload attributes that need to be taken into account for optimal service scalability. Provisioning compute and storage resources to provide a seamless user experience is challenging since customer traffic loads vary widely across time and geographies, and the servers hosting these applications have to be rightsized to provide both performance within a single server and across a scale-out cluster. Typical user-facing web services have a three tiered hierarchy — front-end web servers, middle-tier application logic, and back-end data storage and processing layer. In this paper, we address the challenge of disk subsystem design for back-end servers hosting large amounts of unstructured (also called blob) data. Examples of typical content hosted on such servers include user generated content such as photos, email messages, videos, and social networking updates. Specific server applications analyzed in this paper correspond to the message store of a large scale email application, image tile storage for a large scale geo-mapping application, and user content storage for Web 2.0 type applications. We analyze the storage subsystems for these web services in a live production environment and provide an overview of the disk traffic patterns and access characteristics for each of these applications. We then explore time-series characteristics and derive probabilistic models showing state transitions between locations on the data volumes for these applications. We then explore how these probabilistic models could be extended into a framework for synthetic benchmark generation for such applications. Finally, we discuss how this framework can be used for storage subsystem rightsizing for optimal scalability of such backend storage clusters.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2009.5306786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Mega datacenters hosting large scale web services have unique workload attributes that need to be taken into account for optimal service scalability. Provisioning compute and storage resources to provide a seamless user experience is challenging since customer traffic loads vary widely across time and geographies, and the servers hosting these applications have to be rightsized to provide both performance within a single server and across a scale-out cluster. Typical user-facing web services have a three tiered hierarchy — front-end web servers, middle-tier application logic, and back-end data storage and processing layer. In this paper, we address the challenge of disk subsystem design for back-end servers hosting large amounts of unstructured (also called blob) data. Examples of typical content hosted on such servers include user generated content such as photos, email messages, videos, and social networking updates. Specific server applications analyzed in this paper correspond to the message store of a large scale email application, image tile storage for a large scale geo-mapping application, and user content storage for Web 2.0 type applications. We analyze the storage subsystems for these web services in a live production environment and provide an overview of the disk traffic patterns and access characteristics for each of these applications. We then explore time-series characteristics and derive probabilistic models showing state transitions between locations on the data volumes for these applications. We then explore how these probabilistic models could be extended into a framework for synthetic benchmark generation for such applications. Finally, we discuss how this framework can be used for storage subsystem rightsizing for optimal scalability of such backend storage clusters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线服务应用中非结构化数据的存储特性

托管大规模web服务的大型数据中心具有独特的工作负载属性，需要考虑这些属性以实现最佳的服务可伸缩性。配置计算和存储资源以提供无缝的用户体验是具有挑战性的，因为客户流量负载在不同的时间和地理位置上变化很大，托管这些应用程序的服务器必须适当调整大小，以便在单个服务器和跨横向扩展集群内提供性能。典型的面向用户的web服务具有三层层次结构——前端web服务器、中间层应用程序逻辑以及后端数据存储和处理层。在本文中，我们解决了为承载大量非结构化(也称为blob)数据的后端服务器设计磁盘子系统的挑战。托管在这种服务器上的典型内容示例包括用户生成的内容，如照片、电子邮件消息、视频和社交网络更新。本文分析的具体服务器应用分别对应于大型电子邮件应用程序的消息存储、大型地理地图应用程序的图像存储和Web 2.0类型应用程序的用户内容存储。我们在实时生产环境中分析这些web服务的存储子系统，并概述每个应用程序的磁盘流量模式和访问特征。然后，我们探索时间序列特征，并推导概率模型，显示这些应用程序数据卷上位置之间的状态转换。然后，我们将探讨如何将这些概率模型扩展到一个框架中，以便为此类应用程序生成综合基准。最后，我们讨论了如何使用该框架来调整存储子系统的大小，以实现此类后端存储集群的最佳可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2009 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量

期刊最新文献

Message from the program chair Message from the general chair Experimental evaluation of N-tier systems: Observation and analysis of multi-bottlenecks Performance characterization and optimization of mobile augmented reality on handheld platforms A communication characterisation of Splash-2 and Parsec