{"title":"Experimental Study of Characterizing Frequent Itemsets Using Representation Learning","authors":"S. Kawanobe, Tomonobu Ozaki","doi":"10.1109/WAINA.2018.00082","DOIUrl":null,"url":null,"abstract":"Frequent itemset mining is the most fundamental problem in data mining. In this task, a set of items is adopted as a pattern, and all patterns frequently appearing in a database must be enumerated. While extensive research has been conducted over a long period, including proposals of sophisticated patterns for capturing interesting and meaningful information as well as developments of fast and scalable algorithms, low comprehensibility of obtained patterns is widely recognized as an unsolved essential drawback in frequent itemset mining. In this paper, to cope with this drawback, we propose to use representation learning to characterize each frequent pattern from various perspectives. Concretely speaking, we perform cluster analysis in the obtained vector space to identify representative and outlier patterns because we believe that these representatives and outliers must play important roles to understand the whole patterns. Furthermore, in order to obtain significant patterns having various roles to understand the pattern sets, we utilize the degree of centrality in a pattern network built by drawing edges among similar patterns. Experiments are conducted using a real dataset in Japanese video-sharing site Nicovideo (nicovideo.jp). The results show the effectiveness of the proposed framework for identifying characteristic patterns having various roles.","PeriodicalId":296466,"journal":{"name":"2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2018.00082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Frequent itemset mining is the most fundamental problem in data mining. In this task, a set of items is adopted as a pattern, and all patterns frequently appearing in a database must be enumerated. While extensive research has been conducted over a long period, including proposals of sophisticated patterns for capturing interesting and meaningful information as well as developments of fast and scalable algorithms, low comprehensibility of obtained patterns is widely recognized as an unsolved essential drawback in frequent itemset mining. In this paper, to cope with this drawback, we propose to use representation learning to characterize each frequent pattern from various perspectives. Concretely speaking, we perform cluster analysis in the obtained vector space to identify representative and outlier patterns because we believe that these representatives and outliers must play important roles to understand the whole patterns. Furthermore, in order to obtain significant patterns having various roles to understand the pattern sets, we utilize the degree of centrality in a pattern network built by drawing edges among similar patterns. Experiments are conducted using a real dataset in Japanese video-sharing site Nicovideo (nicovideo.jp). The results show the effectiveness of the proposed framework for identifying characteristic patterns having various roles.