{"title":"基于后缀树的关联规则挖掘和双聚类并行方法","authors":"K. Mondal, Sayan Bhattacharya, A. Mondal","doi":"10.1109/ICCECE.2016.8009578","DOIUrl":null,"url":null,"abstract":"Data mining is the process of analyzing raw data from very large databases to turn them into useful and previously unknown information. This helps in finding out interesting patterns, trends and relationships within data. Association rule mining and bi-clustering are two very important data mining tasks for many application domains, especially in bio-informatics. FIST is one of the very few algorithms which extracts bases of association rules and bi-clustering conjointly in a single process. FIST algorithm is based on frequent closed itemsets framework and uses a suffix tree based data structure for efficiency. However, due to its sequential execution approach, the traditional FIST algorithm suffers from efficiency problems in terms of execution time for very large data sets with high dimensionality. Here, a parallelized version of FIST algorithm is proposed to improve the performance. In the new parallelize version of FIST algorithm (ParaFIST), a multi-threaded approach is taken to allow parallel processing of the suffix tree branches to achieve better execution time. We have used an example to demonstrate the correctness of the proposed algorithm.","PeriodicalId":414303,"journal":{"name":"2016 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A suffix tree based parallel approach for association rule mining and biclustering\",\"authors\":\"K. Mondal, Sayan Bhattacharya, A. Mondal\",\"doi\":\"10.1109/ICCECE.2016.8009578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data mining is the process of analyzing raw data from very large databases to turn them into useful and previously unknown information. This helps in finding out interesting patterns, trends and relationships within data. Association rule mining and bi-clustering are two very important data mining tasks for many application domains, especially in bio-informatics. FIST is one of the very few algorithms which extracts bases of association rules and bi-clustering conjointly in a single process. FIST algorithm is based on frequent closed itemsets framework and uses a suffix tree based data structure for efficiency. However, due to its sequential execution approach, the traditional FIST algorithm suffers from efficiency problems in terms of execution time for very large data sets with high dimensionality. Here, a parallelized version of FIST algorithm is proposed to improve the performance. In the new parallelize version of FIST algorithm (ParaFIST), a multi-threaded approach is taken to allow parallel processing of the suffix tree branches to achieve better execution time. We have used an example to demonstrate the correctness of the proposed algorithm.\",\"PeriodicalId\":414303,\"journal\":{\"name\":\"2016 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE.2016.8009578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE.2016.8009578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A suffix tree based parallel approach for association rule mining and biclustering
Data mining is the process of analyzing raw data from very large databases to turn them into useful and previously unknown information. This helps in finding out interesting patterns, trends and relationships within data. Association rule mining and bi-clustering are two very important data mining tasks for many application domains, especially in bio-informatics. FIST is one of the very few algorithms which extracts bases of association rules and bi-clustering conjointly in a single process. FIST algorithm is based on frequent closed itemsets framework and uses a suffix tree based data structure for efficiency. However, due to its sequential execution approach, the traditional FIST algorithm suffers from efficiency problems in terms of execution time for very large data sets with high dimensionality. Here, a parallelized version of FIST algorithm is proposed to improve the performance. In the new parallelize version of FIST algorithm (ParaFIST), a multi-threaded approach is taken to allow parallel processing of the suffix tree branches to achieve better execution time. We have used an example to demonstrate the correctness of the proposed algorithm.