动态Web日志数据中高效Web访问序列的挖掘

2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing Pub Date : 2010-06-09 DOI:10.1109/SNPD.2010.21

Chowdhury Farhan Ahmed, S. Tanbeer, Byeong-Soo Jeong

{"title":"动态Web日志数据中高效Web访问序列的挖掘","authors":"Chowdhury Farhan Ahmed, S. Tanbeer, Byeong-Soo Jeong","doi":"10.1109/SNPD.2010.21","DOIUrl":null,"url":null,"abstract":"Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web traversal sequences with external utility, i.e., different impacts/significances for different web pages. In this paper, we propose a new approach to solve these problems. Moreover, we propose two novel tree structures, called UWAS-tree (utility-based web access sequence tree), and IUWAS-tree (incremental UWAS tree), for mining web access sequences in static and dynamic databases respectively. Our approach can handle both forward and backward references, static and dynamic data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times and considers both internal and external utilities of a web page. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility web access sequences.","PeriodicalId":266363,"journal":{"name":"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Mining High Utility Web Access Sequences in Dynamic Web Log Data\",\"authors\":\"Chowdhury Farhan Ahmed, S. Tanbeer, Byeong-Soo Jeong\",\"doi\":\"10.1109/SNPD.2010.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web traversal sequences with external utility, i.e., different impacts/significances for different web pages. In this paper, we propose a new approach to solve these problems. Moreover, we propose two novel tree structures, called UWAS-tree (utility-based web access sequence tree), and IUWAS-tree (incremental UWAS tree), for mining web access sequences in static and dynamic databases respectively. Our approach can handle both forward and backward references, static and dynamic data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times and considers both internal and external utilities of a web page. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility web access sequences.\",\"PeriodicalId\":266363,\"journal\":{\"name\":\"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNPD.2010.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2010.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

挖掘web访问序列可以从web日志中发现非常有用的知识，具有广泛的应用前景。通过考虑网页的非二进制出现作为网页访问序列的内部实用程序，例如，每个用户在网页上花费的时间，可以提取更真实的信息。然而，现有的基于实用程序的方法有许多局限性，例如只考虑web访问序列的前向引用，不适用于增量挖掘，在分层候选生成和测试方法中受到影响，需要多次数据库扫描，并且没有显示如何使用外部实用程序挖掘web遍历序列，即不同网页的不同影响/意义。在本文中，我们提出了一种解决这些问题的新方法。此外，我们提出了两种新的树结构，分别称为UWAS-tree(基于效用的web访问序列树)和IUWAS-tree(增量式UWAS树)，用于在静态和动态数据库中挖掘web访问序列。我们的方法可以处理前向和后向引用、静态和动态数据，避免了分层候选生成和测试方法，不需要多次扫描数据库，并考虑了网页的内部和外部实用程序。广泛的性能分析表明，我们的方法对于高实用web访问序列的静态和增量挖掘都非常有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mining High Utility Web Access Sequences in Dynamic Web Log Data

Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web traversal sequences with external utility, i.e., different impacts/significances for different web pages. In this paper, we propose a new approach to solve these problems. Moreover, we propose two novel tree structures, called UWAS-tree (utility-based web access sequence tree), and IUWAS-tree (incremental UWAS tree), for mining web access sequences in static and dynamic databases respectively. Our approach can handle both forward and backward references, static and dynamic data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times and considers both internal and external utilities of a web page. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility web access sequences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

自引率

0.00%

发文量