{"title":"不确定环境下大规模顺序效用模式挖掘","authors":"J. Wu, Shuo Liu, Jerry Chun‐wei Lin","doi":"10.1109/ICDMW58026.2022.00077","DOIUrl":null,"url":null,"abstract":"High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large-Scale Sequential Utility Pattern Mining in Uncertain Environments\",\"authors\":\"J. Wu, Shuo Liu, Jerry Chun‐wei Lin\",\"doi\":\"10.1109/ICDMW58026.2022.00077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.\",\"PeriodicalId\":146687,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW58026.2022.00077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large-Scale Sequential Utility Pattern Mining in Uncertain Environments
High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.