TKC: Mining Top-K Cross-Level High Utility Itemsets

M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu
{"title":"TKC: Mining Top-K Cross-Level High Utility Itemsets","authors":"M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu","doi":"10.1109/ICDMW51313.2020.00095","DOIUrl":null,"url":null,"abstract":"High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TKC:挖掘Top-K跨级别高实用物品集
高效用项集挖掘是一种被广泛研究的用于分析客户交易的数据挖掘任务。目标是找到所有高效用物品集,即一起购买的产生利润等于或大于用户定义的最小效用阈值的物品。然而,传统的高效用项目集挖掘算法的一个局限性是忽略了项目类别(例如饮料,乳制品)。最近,设计了两种算法来寻找多层次和跨层次的高效用项目集,以揭示项目之间和/或项目类别之间的关系。这可以通过考虑产品分类法来实现,产品分类法将项目组织成层次结构。虽然这些算法可以揭示有趣的模式,但问题是设置最小效用阈值并不直观,并且会极大地影响发现的模式数量和算法的性能。如果用户将阈值设置得太低,则会发现大量的模式,并且运行时间可能很长,而如果阈值设置得太高,则会发现很少的模式。因此,用户通常必须多次运行算法才能找到合适的阈值,以获得刚好足够的模式。本文通过提出一种名为TKC (Top-K Cross-level high utility itemset miner)的新算法来解决这个问题,该算法允许用户直接设置要发现的模式的数量。TKC执行深度优先搜索,包括搜索空间修剪技术和优化,以提高其性能。利用分类信息对零售数据进行了实验。结果表明,该算法是有效的,优化后的算法性能得到了提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1