Matheus Santos, Wagner Meira Jr, D. Guedes, Virgílio A. F. Almeida
{"title":"Faster: A Low Overhead Framework for Massive Data Analysis","authors":"Matheus Santos, Wagner Meira Jr, D. Guedes, Virgílio A. F. Almeida","doi":"10.1109/CCGrid.2016.90","DOIUrl":null,"url":null,"abstract":"With the recent accelerated increase in the amount of social data available in the Internet, several big data distributed processing frameworks have been proposed and implemented. Hadoop has been used widely to process all kinds of data, not only from social media. Spark is gaining popularity for offering a more flexible, object-functional, programming interface, and also by improving performance in many cases. However, not all data analysis algorithms perform well on Hadoop or Spark. For instance, graph algorithms tend to generate large amounts of messages between processing elements, which may result in poor performance even in Spark. We introduce Faster, a low latency distributed processing framework, designed to explore data locality to reduce processing costs in such algorithms. It offers an API similar to Spark, but with a slightly different execution model and new operators. Our results show that it can significantly outperform Spark on large graphs, being up to one orders of magnitude faster when running PageRank in a partial Google+ friendship graph with more than one billion edges.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
With the recent accelerated increase in the amount of social data available in the Internet, several big data distributed processing frameworks have been proposed and implemented. Hadoop has been used widely to process all kinds of data, not only from social media. Spark is gaining popularity for offering a more flexible, object-functional, programming interface, and also by improving performance in many cases. However, not all data analysis algorithms perform well on Hadoop or Spark. For instance, graph algorithms tend to generate large amounts of messages between processing elements, which may result in poor performance even in Spark. We introduce Faster, a low latency distributed processing framework, designed to explore data locality to reduce processing costs in such algorithms. It offers an API similar to Spark, but with a slightly different execution model and new operators. Our results show that it can significantly outperform Spark on large graphs, being up to one orders of magnitude faster when running PageRank in a partial Google+ friendship graph with more than one billion edges.