Optimal Grid Exploitation Algorithms for Data Mining

2006 Fifth International Symposium on Parallel and Distributed Computing Pub Date : 2006-07-06 DOI:10.1109/ISPDC.2006.36

Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel

{"title":"Optimal Grid Exploitation Algorithms for Data Mining","authors":"Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel","doi":"10.1109/ISPDC.2006.36","DOIUrl":null,"url":null,"abstract":"Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency","PeriodicalId":196790,"journal":{"name":"2006 Fifth International Symposium on Parallel and Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Fifth International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2006.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据挖掘的最优网格利用算法

尽管许多数据挖掘任务已经并行化，因此可以在专用集群上执行，但目前很少有解决方案可以解决网格或非专业工作站网络上的数据挖掘问题。目前的趋势是集中使用网格和/或桌面网格，以便利用任何可用的工作站，而不考虑它们的物理位置。如果特定于网格的算法具有与专用集群算法的一些共同特征，则许多约束是使用网格所固有的。特别是资源的波动性和通信成本降低了并行性的有效性。DisDaMin项目(分布式数据挖掘)重新审视了数据挖掘任务，并提出了新的网格可利用算法。DisDaMin机制首先使用聚类方法实现特定的数据碎片，然后根据网格上执行的具体情况实现异步协作技术。使用这种碎片化方法可以在每个节点上以最少的通信进行最佳的本地处理。利用这一点，我们引入了分布式算法DICCoop，这是Brin等人(1997)对DIC的一种改编。在法国国家电网GRID5000(欧洲CoreGrid的一部分)上进行了仿真，以证明所提出机制的效率。分析了众多参数对并行效率优化的影响

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2006 Fifth International Symposium on Parallel and Distributed Computing

自引率

0.00%

发文量

期刊最新文献

Security Framework for P2P Based Grid Systems Mobile Parallel Computing Towards a Model for Broadcasting Secure Mobile Processes Usage of Global States-Based Application Control Austrian Grid: Overview on the Project with Focus on Parallel Applications