{"title":"The Construction of Transactions for Web Usage Mining","authors":"Yan Li, Boqin Feng","doi":"10.1109/CINC.2009.101","DOIUrl":null,"url":null,"abstract":"A data preprocessing system for constructing the transactions in web usage mining is presented. To implement transaction identification, the user sessions and the user access paths are extracted from the web access log and missing information is appended. These tasks are accomplished with the application of the referer-based method, which is an effective solution to the problems introduced by using proxy servers, local caching and firewall. Meanwhile, the reference length of accessed pages is calculated with the consideration of the time spent on data transfer over internet. Then two kinds of transactions are defined, i.e. travel-path transactions and content-only transactions. These two kinds of transactions are constructed by the maximal forward references (MFR) algorithm and the reference length (RL) algorithm, respectively. As verified by practical web access log, it is shown that the transactions can be efficiently identified while the reliability of the original web access data is obviously improved for the further researches.","PeriodicalId":173506,"journal":{"name":"2009 International Conference on Computational Intelligence and Natural Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Computational Intelligence and Natural Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINC.2009.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
A data preprocessing system for constructing the transactions in web usage mining is presented. To implement transaction identification, the user sessions and the user access paths are extracted from the web access log and missing information is appended. These tasks are accomplished with the application of the referer-based method, which is an effective solution to the problems introduced by using proxy servers, local caching and firewall. Meanwhile, the reference length of accessed pages is calculated with the consideration of the time spent on data transfer over internet. Then two kinds of transactions are defined, i.e. travel-path transactions and content-only transactions. These two kinds of transactions are constructed by the maximal forward references (MFR) algorithm and the reference length (RL) algorithm, respectively. As verified by practical web access log, it is shown that the transactions can be efficiently identified while the reliability of the original web access data is obviously improved for the further researches.