Diamanto Oikonomopoulou, Maria Rigou, S. Sirmakessis, A. Tsakalidis
{"title":"Full-Coverage Web Prediction based on Web Usage Mining and Site Topology","authors":"Diamanto Oikonomopoulou, Maria Rigou, S. Sirmakessis, A. Tsakalidis","doi":"10.1109/WI.2004.71","DOIUrl":null,"url":null,"abstract":"Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based on the extraction of sequential navigation patterns from server log files, combined with web site topology. Traversed paths are monitored, internally recorded and cleaned before being completed with cashed page views. After session and episode identification follows the construction of n-grams. Prediction is based upon a 5 + n-gram schema with all lower level n-grams participating, a procedure that resembles the construction of an All 5th-order Markov Model. The schema achieves full coverage while maintaining competitive prediction precision.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2004.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based on the extraction of sequential navigation patterns from server log files, combined with web site topology. Traversed paths are monitored, internally recorded and cleaned before being completed with cashed page views. After session and episode identification follows the construction of n-grams. Prediction is based upon a 5 + n-gram schema with all lower level n-grams participating, a procedure that resembles the construction of an All 5th-order Markov Model. The schema achieves full coverage while maintaining competitive prediction precision.