F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori
{"title":"Accelerating outlier detection with intra- and inter-node parallelism","authors":"F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori","doi":"10.1109/HPCSim.2014.6903723","DOIUrl":null,"url":null,"abstract":"Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"476-483"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2014.6903723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.