{"title":"An Adaptive Synchronization Policy for Harvesting OAI-PMH Repositories","authors":"N. Adly","doi":"10.1109/DBKDA.2009.9","DOIUrl":null,"url":null,"abstract":"Metadata harvesting requires timely propagation of up-to-date information from thousands of Repositories over a wide area network. It is desirable to keep the data as fresh as possible while observing the overhead on the Harvester. An important dimension to be considered is that Repositories vary widely in their update patterns; they may experience different update rates at different times or unexpected changes to update patterns. In this paper, we define data Freshness metrics and propose an adaptive algorithm for the synchronization of the Harvester with the Repositories. The algorithm is based on meeting a desired level of Freshness while incurring the minimum overhead on the Harvester. We present a comparison between different policies for the synchronization within the framework devised. It is shown that the proposed policy outperform the other policies, especially for heterogeneous update patterns.","PeriodicalId":231150,"journal":{"name":"2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DBKDA.2009.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Metadata harvesting requires timely propagation of up-to-date information from thousands of Repositories over a wide area network. It is desirable to keep the data as fresh as possible while observing the overhead on the Harvester. An important dimension to be considered is that Repositories vary widely in their update patterns; they may experience different update rates at different times or unexpected changes to update patterns. In this paper, we define data Freshness metrics and propose an adaptive algorithm for the synchronization of the Harvester with the Repositories. The algorithm is based on meeting a desired level of Freshness while incurring the minimum overhead on the Harvester. We present a comparison between different policies for the synchronization within the framework devised. It is shown that the proposed policy outperform the other policies, especially for heterogeneous update patterns.