{"title":"Scalable data mining with log based consistency DSM for high performance distributed computing","authors":"Hideaki Hirayama, H. Honda, T. Yuba","doi":"10.1109/ICECCS.2000.873938","DOIUrl":null,"url":null,"abstract":"Mining the large Web based online distributed databases to discover new knowledge and financial gain is an important research problem. These computations require high performance distributed and parallel computing environments. Traditional data mining techniques such as classification, association, clustering can be extended to find new efficient solutions. The paper presents the scalable data mining problem, proposes the use of software DSM (distributed shared memory) with a new mechanism as an effective solution and discusses both the implementation and performance evaluation results. It is observed that the overhead of a software DSM is very large for scalable data mining programs. A new Log Based Consistency (LBC) mechanism, especially designed for scalable data mining on the software DSM is proposed to overcome this overhead. Traditional association rule based data mining programs frequently modify the same fields by count-up operations. In contrast, the LBC mechanism keeps up the consistency by broadcasting the count-up operation logs among the multiple nodes.","PeriodicalId":228728,"journal":{"name":"Proceedings Sixth IEEE International Conference on Engineering of Complex Computer Systems. ICECCS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Sixth IEEE International Conference on Engineering of Complex Computer Systems. ICECCS 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCS.2000.873938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Mining the large Web based online distributed databases to discover new knowledge and financial gain is an important research problem. These computations require high performance distributed and parallel computing environments. Traditional data mining techniques such as classification, association, clustering can be extended to find new efficient solutions. The paper presents the scalable data mining problem, proposes the use of software DSM (distributed shared memory) with a new mechanism as an effective solution and discusses both the implementation and performance evaluation results. It is observed that the overhead of a software DSM is very large for scalable data mining programs. A new Log Based Consistency (LBC) mechanism, especially designed for scalable data mining on the software DSM is proposed to overcome this overhead. Traditional association rule based data mining programs frequently modify the same fields by count-up operations. In contrast, the LBC mechanism keeps up the consistency by broadcasting the count-up operation logs among the multiple nodes.