{"title":"FAST:一种新的基于抽样的关联规则发现算法","authors":"Bin Chen, P. Haas, P. Scheuermann","doi":"10.1109/ICDE.2002.994717","DOIUrl":null,"url":null,"abstract":"We present FAST (finding associations from sampled transactions), a refined sampling-based mining algorithm that is distinguished from prior algorithms by its novel two-phase approach to sample collection. In phase I a large sample is collected to quickly and accurately estimate the support of each item in the database. In phase II, a small final sample is obtained by excluding \"outlier\" transactions in such a manner that the support of each item in the final sample is as close as possible to the estimated support of the item in the entire database. We propose two approaches to obtaining the final sample in phase II: trimming and growing. The trimming procedure starts from the large initial sample and removes outlier transactions until a specified stopping criterion is satisfied. In contrast, the growing procedure selects representative transactions from the initial sample and adds them to an initially empty data set.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"FAST: a new sampling-based algorithm for discovering association rules\",\"authors\":\"Bin Chen, P. Haas, P. Scheuermann\",\"doi\":\"10.1109/ICDE.2002.994717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present FAST (finding associations from sampled transactions), a refined sampling-based mining algorithm that is distinguished from prior algorithms by its novel two-phase approach to sample collection. In phase I a large sample is collected to quickly and accurately estimate the support of each item in the database. In phase II, a small final sample is obtained by excluding \\\"outlier\\\" transactions in such a manner that the support of each item in the final sample is as close as possible to the estimated support of the item in the entire database. We propose two approaches to obtaining the final sample in phase II: trimming and growing. The trimming procedure starts from the large initial sample and removes outlier transactions until a specified stopping criterion is satisfied. In contrast, the growing procedure selects representative transactions from the initial sample and adds them to an initially empty data set.\",\"PeriodicalId\":191529,\"journal\":{\"name\":\"Proceedings 18th International Conference on Data Engineering\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 18th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2002.994717\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FAST: a new sampling-based algorithm for discovering association rules
We present FAST (finding associations from sampled transactions), a refined sampling-based mining algorithm that is distinguished from prior algorithms by its novel two-phase approach to sample collection. In phase I a large sample is collected to quickly and accurately estimate the support of each item in the database. In phase II, a small final sample is obtained by excluding "outlier" transactions in such a manner that the support of each item in the final sample is as close as possible to the estimated support of the item in the entire database. We propose two approaches to obtaining the final sample in phase II: trimming and growing. The trimming procedure starts from the large initial sample and removes outlier transactions until a specified stopping criterion is satisfied. In contrast, the growing procedure selects representative transactions from the initial sample and adds them to an initially empty data set.