{"title":"小熊:多元序列分类使用有界z分数与抽样","authors":"A. Richardson, G. Kaminka, Sarit Kraus","doi":"10.1109/ICDMW.2010.38","DOIUrl":null,"url":null,"abstract":"Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses item set mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved item set mining algorithm that solves the short sequence bias present in many item set mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it. We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"CUBS: Multivariate Sequence Classification Using Bounded Z-score with Sampling\",\"authors\":\"A. Richardson, G. Kaminka, Sarit Kraus\",\"doi\":\"10.1109/ICDMW.2010.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses item set mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved item set mining algorithm that solves the short sequence bias present in many item set mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it. We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.\",\"PeriodicalId\":170201,\"journal\":{\"name\":\"2010 IEEE International Conference on Data Mining Workshops\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2010.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2010.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CUBS: Multivariate Sequence Classification Using Bounded Z-score with Sampling
Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses item set mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved item set mining algorithm that solves the short sequence bias present in many item set mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it. We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.