SimProf: A Sampling Framework for Data Analytic Workloads

Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, Hyesoon Kim
{"title":"SimProf: A Sampling Framework for Data Analytic Workloads","authors":"Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, Hyesoon Kim","doi":"10.1109/IPDPS.2017.118","DOIUrl":null,"url":null,"abstract":"Today, there is a steep rise in the amount of data being collected from diverse applications. Consequently, data analytic workloads are gaining popularity to gain insight that can benefit the application, e.g., financial trading, social media analysis. To study the architectural behavior of the workloads, architectural simulation is one of the most common approaches. However, because of the long-running nature of the workloads, it is not trivial to identify which parts of the analysis to simulate. In the current work, we introduce SimProf, a sampling framework for data analytic workloads. Using this tool, we are able to select representative simulation points based on the phase behavior of the analysis at a method level granularity. This provides a better understanding of the simulation point and also reduces the simulation time for different input sets. We present the framework for Apache Hadoop and Apache Spark frameworks, which can be easily extended to other data analytic workloads.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Today, there is a steep rise in the amount of data being collected from diverse applications. Consequently, data analytic workloads are gaining popularity to gain insight that can benefit the application, e.g., financial trading, social media analysis. To study the architectural behavior of the workloads, architectural simulation is one of the most common approaches. However, because of the long-running nature of the workloads, it is not trivial to identify which parts of the analysis to simulate. In the current work, we introduce SimProf, a sampling framework for data analytic workloads. Using this tool, we are able to select representative simulation points based on the phase behavior of the analysis at a method level granularity. This provides a better understanding of the simulation point and also reduces the simulation time for different input sets. We present the framework for Apache Hadoop and Apache Spark frameworks, which can be easily extended to other data analytic workloads.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SimProf:数据分析工作负载的采样框架
今天,从各种应用程序收集的数据量急剧增加。因此,数据分析工作负载越来越受欢迎,以获得对应用程序有益的洞察力,例如金融交易、社交媒体分析。为了研究工作负载的体系结构行为,体系结构模拟是最常用的方法之一。但是,由于工作负载的长时间运行性质,确定要模拟分析的哪些部分并不是一件容易的事情。在当前的工作中,我们介绍了SimProf,这是一个用于数据分析工作负载的采样框架。使用此工具,我们能够根据方法级别粒度的分析的阶段行为选择具有代表性的模拟点。这样可以更好地理解模拟点,还可以减少不同输入集的模拟时间。我们为Apache Hadoop和Apache Spark框架提供了一个框架,它可以很容易地扩展到其他数据分析工作负载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL Toucan — A Translator for Communication Tolerant MPI Applications Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs Dynamic Memory-Aware Task-Tree Scheduling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1