Sequential Pattern Sampling with Norm Constraints

Lamine Diop, Cheikh Talibouya Diop, A. Giacometti, Dominique H. Li, Arnaud Soulet
{"title":"Sequential Pattern Sampling with Norm Constraints","authors":"Lamine Diop, Cheikh Talibouya Diop, A. Giacometti, Dominique H. Li, Arnaud Soulet","doi":"10.1109/ICDM.2018.00024","DOIUrl":null,"url":null,"abstract":"In recent years, the field of pattern mining has shifted to user-centered methods. In such a context, it is necessary to have a tight coupling between the system and the user where mining techniques provide results at any time or within a short response time of only few seconds. Pattern sampling is a non-exhaustive method for instantly discovering relevant patterns that ensures a good interactivity while providing strong statistical guarantees due to its random nature. Curiously, such an approach investigated for itemsets and subgraphs has not yet been applied to sequential patterns, which are useful for a wide range of mining tasks and application fields. In this paper, we propose the first method for sequential pattern sampling. In addition to address sequential data, the originality of our approach is to introduce a constraint on the norm to control the length of the drawn patterns and to avoid the pitfall of the \"long tail\" where the rarest patterns flood the user. We propose a new constrained two-step random procedure, named CSSampling, that randomly draws sequential patterns according to frequency with an interval constraint on the norm. We demonstrate that this method performs an exact sampling. Moreover, despite the use of rejection sampling, the experimental study shows that CSSampling remains efficient and the constraint helps to draw general patterns of the \"head\". We also illustrate how to benefit from these sampled patterns to instantly build an associative classifier dedicated to sequences. This classification approach rivals state of the art proposals showing the interest of constrained sequential pattern sampling.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

In recent years, the field of pattern mining has shifted to user-centered methods. In such a context, it is necessary to have a tight coupling between the system and the user where mining techniques provide results at any time or within a short response time of only few seconds. Pattern sampling is a non-exhaustive method for instantly discovering relevant patterns that ensures a good interactivity while providing strong statistical guarantees due to its random nature. Curiously, such an approach investigated for itemsets and subgraphs has not yet been applied to sequential patterns, which are useful for a wide range of mining tasks and application fields. In this paper, we propose the first method for sequential pattern sampling. In addition to address sequential data, the originality of our approach is to introduce a constraint on the norm to control the length of the drawn patterns and to avoid the pitfall of the "long tail" where the rarest patterns flood the user. We propose a new constrained two-step random procedure, named CSSampling, that randomly draws sequential patterns according to frequency with an interval constraint on the norm. We demonstrate that this method performs an exact sampling. Moreover, despite the use of rejection sampling, the experimental study shows that CSSampling remains efficient and the constraint helps to draw general patterns of the "head". We also illustrate how to benefit from these sampled patterns to instantly build an associative classifier dedicated to sequences. This classification approach rivals state of the art proposals showing the interest of constrained sequential pattern sampling.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
范数约束下的顺序模式抽样
近年来,模式挖掘领域已经转向以用户为中心的方法。在这种情况下,有必要在系统和用户之间实现紧密耦合,在这种情况下,挖掘技术可以随时或在只有几秒钟的短响应时间内提供结果。模式抽样是一种非详尽的方法,用于立即发现相关模式,确保良好的交互性,同时由于其随机性而提供强大的统计保证。奇怪的是,这种用于项集和子图的方法尚未应用于序列模式,而序列模式对于广泛的挖掘任务和应用领域是有用的。本文提出了序列模式采样的第一种方法。除了处理顺序数据之外,我们的方法的独创性在于在规范上引入约束来控制绘制模式的长度,并避免“长尾”的陷阱,即最稀有的模式会淹没用户。我们提出了一种新的有约束的两步随机过程CSSampling,它根据频率随机绘制序列模式,范数上有区间约束。我们证明了这种方法可以进行精确的抽样。此外,实验研究表明,尽管使用了拒绝抽样,CSSampling仍然是有效的,并且约束有助于绘制“头部”的一般模式。我们还说明了如何从这些采样模式中获益,以便立即构建专用于序列的关联分类器。这种分类方法与显示约束序列模式采样兴趣的最新建议相竞争。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entire Regularization Path for Sparse Nonnegative Interaction Model Accelerating Experimental Design by Incorporating Experimenter Hunches Title Page i An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains Social Recommendation with Missing Not at Random Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1