NetNMSP: Nonoverlapping maximal sequential pattern mining

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2022-01-10 DOI:10.1007/s10489-021-02912-3
Yan Li, Shuai Zhang, Lei Guo, Jing Liu, Youxi Wu, Xindong Wu
{"title":"NetNMSP: Nonoverlapping maximal sequential pattern mining","authors":"Yan Li,&nbsp;Shuai Zhang,&nbsp;Lei Guo,&nbsp;Jing Liu,&nbsp;Youxi Wu,&nbsp;Xindong Wu","doi":"10.1007/s10489-021-02912-3","DOIUrl":null,"url":null,"abstract":"<div><p>Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms focused on finding all frequent patterns and found lots of redundant short patterns. However, it not only reduces the mining efficiency, but also increases the difficulty in obtaining the demand information. To reduce the frequent patterns and retain its expression ability, this paper focuses on the Nonoverlapping Maximal Sequential Pattern (NMSP) mining which refers to finding frequent patterns whose super-patterns are infrequent. In this paper, we propose an effective mining algorithm, Nettree for NMSP mining (NetNMSP), which has three key steps: calculating the support, generating the candidate patterns, and determining NMSPs. To efficiently calculate the support, NetNMSP employs the backtracking strategy to obtain a nonoverlapping occurrence from the leftmost leaf to its root with the leftmost parent node method in a Nettree. To reduce the candidate patterns, NetNMSP generates candidate patterns by the pattern join strategy. Furthermore, to determine NMSPs, NetNMSP adopts the screening method. Experiments on biological sequence datasets verify that not only does NetNMSP outperform the state-of-the-arts algorithms, but also NMSP mining has better compression performance than closed pattern mining. On sales datasets, we validate that our algorithm guarantees the best scalability on large scale datasets. Moreover, we mine NMSPs and frequent patterns in SARS-CoV-1, SARS-CoV-2 and MERS-CoV. The results show that the three viruses are similar in the short patterns but different in the long patterns. More importantly, NMSP mining is easier to find the differences between the virus sequences.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"52 9","pages":"9861 - 9884"},"PeriodicalIF":3.4000,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-021-02912-3.pdf","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-021-02912-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 18

Abstract

Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms focused on finding all frequent patterns and found lots of redundant short patterns. However, it not only reduces the mining efficiency, but also increases the difficulty in obtaining the demand information. To reduce the frequent patterns and retain its expression ability, this paper focuses on the Nonoverlapping Maximal Sequential Pattern (NMSP) mining which refers to finding frequent patterns whose super-patterns are infrequent. In this paper, we propose an effective mining algorithm, Nettree for NMSP mining (NetNMSP), which has three key steps: calculating the support, generating the candidate patterns, and determining NMSPs. To efficiently calculate the support, NetNMSP employs the backtracking strategy to obtain a nonoverlapping occurrence from the leftmost leaf to its root with the leftmost parent node method in a Nettree. To reduce the candidate patterns, NetNMSP generates candidate patterns by the pattern join strategy. Furthermore, to determine NMSPs, NetNMSP adopts the screening method. Experiments on biological sequence datasets verify that not only does NetNMSP outperform the state-of-the-arts algorithms, but also NMSP mining has better compression performance than closed pattern mining. On sales datasets, we validate that our algorithm guarantees the best scalability on large scale datasets. Moreover, we mine NMSPs and frequent patterns in SARS-CoV-1, SARS-CoV-2 and MERS-CoV. The results show that the three viruses are similar in the short patterns but different in the long patterns. More importantly, NMSP mining is easier to find the differences between the virus sequences.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
NetNMSP:非重叠最大序列模式挖掘
非重叠序列模式挖掘作为一种具有间隙约束的重复序列模式挖掘,可以发现更有价值的模式。传统的算法侧重于寻找所有的频繁模式,并发现了大量冗余的短模式。然而,这不仅降低了挖掘效率,而且增加了获取需求信息的难度。为了减少频繁模式并保持其表达能力,本文重点研究了非重叠最大序列模式(NMSP)挖掘,即寻找超级模式不频繁的频繁模式。在本文中,我们提出了一种有效的NMSP挖掘算法Nettree,该算法包括三个关键步骤:计算支持度、生成候选模式和确定NMSP。为了有效地计算支持,NetNMSP采用回溯策略,利用网络树中最左边的父节点方法,获得从最左边的叶到其根的不重叠出现。为了减少候选模式,NetNMSP通过模式连接策略生成候选模式。此外,为了确定NMSP,NetNMSP采用了筛选方法。在生物序列数据集上的实验验证了NetNMSP不仅优于现有技术的算法,而且NMSP挖掘比封闭模式挖掘具有更好的压缩性能。在销售数据集上,我们验证了我们的算法在大规模数据集上保证了最佳的可扩展性。此外,我们挖掘了NMSP和严重急性呼吸系统综合征冠状病毒1型、严重急性呼吸综合征冠状病毒2型和MERS-CoV的常见模式。结果表明,三种病毒在短模式上相似,但在长模式上不同。更重要的是,NMSP挖掘更容易发现病毒序列之间的差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
期刊最新文献
A prototype evolution network for relation extraction Highway spillage detection using an improved STPM anomaly detection network from a surveillance perspective Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval HG-search: multi-stage search for heterogeneous graph neural networks Channel enhanced cross-modality relation network for visible-infrared person re-identification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1