{"title":"一种高效计算海量数据集社会影响统一模型的分布式算法","authors":"Alex Popa, M. Frîncu, C. Chelmis","doi":"10.1109/HPEC.2017.8091084","DOIUrl":null,"url":null,"abstract":"Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A distributed algorithm for the efficient computation of the unified model of social influence on massive datasets\",\"authors\":\"Alex Popa, M. Frîncu, C. Chelmis\",\"doi\":\"10.1109/HPEC.2017.8091084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.\",\"PeriodicalId\":364903,\"journal\":{\"name\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2017.8091084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A distributed algorithm for the efficient computation of the unified model of social influence on massive datasets
Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.