Optimal Grid Exploitation Algorithms for Data Mining

Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel
{"title":"Optimal Grid Exploitation Algorithms for Data Mining","authors":"Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel","doi":"10.1109/ISPDC.2006.36","DOIUrl":null,"url":null,"abstract":"Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency","PeriodicalId":196790,"journal":{"name":"2006 Fifth International Symposium on Parallel and Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Fifth International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2006.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据挖掘的最优网格利用算法
尽管许多数据挖掘任务已经并行化,因此可以在专用集群上执行,但目前很少有解决方案可以解决网格或非专业工作站网络上的数据挖掘问题。目前的趋势是集中使用网格和/或桌面网格,以便利用任何可用的工作站,而不考虑它们的物理位置。如果特定于网格的算法具有与专用集群算法的一些共同特征,则许多约束是使用网格所固有的。特别是资源的波动性和通信成本降低了并行性的有效性。DisDaMin项目(分布式数据挖掘)重新审视了数据挖掘任务,并提出了新的网格可利用算法。DisDaMin机制首先使用聚类方法实现特定的数据碎片,然后根据网格上执行的具体情况实现异步协作技术。使用这种碎片化方法可以在每个节点上以最少的通信进行最佳的本地处理。利用这一点,我们引入了分布式算法DICCoop,这是Brin等人(1997)对DIC的一种改编。在法国国家电网GRID5000(欧洲CoreGrid的一部分)上进行了仿真,以证明所提出机制的效率。分析了众多参数对并行效率优化的影响
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Security Framework for P2P Based Grid Systems Mobile Parallel Computing Towards a Model for Broadcasting Secure Mobile Processes Usage of Global States-Based Application Control Austrian Grid: Overview on the Project with Focus on Parallel Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1