PaDDMAS: parallel and distributed data mining application suite

O. Rana, D. Walker, Maozhen Li, S. Lynden, M. Ward
{"title":"PaDDMAS: parallel and distributed data mining application suite","authors":"O. Rana, D. Walker, Maozhen Li, S. Lynden, M. Ward","doi":"10.1109/IPDPS.2000.846010","DOIUrl":null,"url":null,"abstract":"Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2000.846010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PaDDMAS:并行和分布式数据挖掘应用套件
发现分布式数据集中的复杂关联、异常和模式在一系列科学、医疗和商业应用中越来越受欢迎。在一个领域内,使用各种算法来执行数据分析,范围从统计到机器学习和基于人工智能的技术。然而,要将这种方法扩展到大型数据集,需要解决几个问题,特别是当这些方法应用于分布在不同站点的数据时。随着新的分析技术的确定,核心工具集必须能够轻松集成这些分析组件。同样,来自分析引擎的结果必须是可共享的,以便存储、可视化或进一步分析结果。介绍了基于组件的分布式数据挖掘系统PaDDMAS的体系结构。PaDDMAS提供了一个工具集,用于使用数据流方法将预开发或自定义组件与执行分析、数据提取或数据管理和转换的组件组合在一起。每个组件都包装为Java/CORBA对象,并具有用XML定义的接口。组件可以是串行或并行对象,也可以是二进制或包含更复杂的内部结构。我们使用神经网络分析算法演示了一个原型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting performance on SMPs. A case study: the SGI Power Challenge An optimal parallel algorithm for computing moments on arrays with reconfigurable optical buses Parallel performance study of Monte Carlo photon transport code on shared-, distributed-, and distributed-shared-memory architectures Replicating the contents of a WWW multimedia repository to minimize download time Efficiency of dynamic load balancing based on permanent cells for parallel molecular dynamics simulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1