{"title":"Analysis and Forecasting of Web Content Dynamics","authors":"M. Calzarossa, D. Tessera","doi":"10.1109/WAINA.2018.00056","DOIUrl":null,"url":null,"abstract":"Web content changes have a strong impact on search engines and more generally on technologies dealing with content retrieval and management. These technologies have to take account of the temporal patterns of these changes and adjust their crawling policies accordingly. This paper presents a methodological framework — based on time series analysis -- for modeling and predicting the dynamics of the content changes. To test this framework, we analyze the content of three major news websites whose change patterns are characterized by large fluctuations and significant differences across days and hours. The classical decomposition of the observed time series into trend, seasonal and irregular components is applied to identify the weekly and daily patterns as well as the remaining fluctuations. The corresponding models are used for predicting the future dynamics of the sites based on their current and historical behavior.","PeriodicalId":296466,"journal":{"name":"2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)","volume":"79 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2018.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Web content changes have a strong impact on search engines and more generally on technologies dealing with content retrieval and management. These technologies have to take account of the temporal patterns of these changes and adjust their crawling policies accordingly. This paper presents a methodological framework — based on time series analysis -- for modeling and predicting the dynamics of the content changes. To test this framework, we analyze the content of three major news websites whose change patterns are characterized by large fluctuations and significant differences across days and hours. The classical decomposition of the observed time series into trend, seasonal and irregular components is applied to identify the weekly and daily patterns as well as the remaining fluctuations. The corresponding models are used for predicting the future dynamics of the sites based on their current and historical behavior.