Error Analysis on Harvesting Data over the Internet

S. Kapidakis
{"title":"Error Analysis on Harvesting Data over the Internet","authors":"S. Kapidakis","doi":"10.1145/3197768.3201537","DOIUrl":null,"url":null,"abstract":"Harvesting tasks gather information to a central repository. We studied 880560 harvesting tasks from 3446 harvesting services in 354 harvesting rounds during a period of 15 months, of which 382705 failed and the remaining tasks occasionally returning fewer records. A significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. A harvesting task includes many stages of information exchange, and each one of them may fail - but with different consequences each time. We studied the reported warning messages, the number of records returned, and the required response time to discover relations among them. We found that about half of the harvesting tasks on each harvesting round fail, and the number of failing tasks is slowly increasing. We developed a method of analysis that can be used to reverse engineering such complex network systems and to categorize the reasons of failure into useful classes. Our results do not indicate a new approach to harvesting or conclude to a breakthrough advice, but make clear the complexity of the operation in an ever changing networking environment and alarm the reader that some facts that may be considered trivial, actually they are not! They help us to better understand the risks involved, and to design more reliable procedures and improved ways to closely monitor them.","PeriodicalId":130190,"journal":{"name":"Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3197768.3201537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Harvesting tasks gather information to a central repository. We studied 880560 harvesting tasks from 3446 harvesting services in 354 harvesting rounds during a period of 15 months, of which 382705 failed and the remaining tasks occasionally returning fewer records. A significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. A harvesting task includes many stages of information exchange, and each one of them may fail - but with different consequences each time. We studied the reported warning messages, the number of records returned, and the required response time to discover relations among them. We found that about half of the harvesting tasks on each harvesting round fail, and the number of failing tasks is slowly increasing. We developed a method of analysis that can be used to reverse engineering such complex network systems and to categorize the reasons of failure into useful classes. Our results do not indicate a new approach to harvesting or conclude to a breakthrough advice, but make clear the complexity of the operation in an ever changing networking environment and alarm the reader that some facts that may be considered trivial, actually they are not! They help us to better understand the risks involved, and to design more reliable procedures and improved ways to closely monitor them.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
互联网数据采集的误差分析
收集任务将信息收集到中央存储库。在15个月的时间里,我们研究了来自3446个收集服务的354轮收集任务中的880560个收集任务,其中382705个失败,其余任务偶尔返回较少的记录。Open Archive Initiative收集服务的一个重要部分从未工作过或已经停止工作,而许多其他服务偶尔会失败。收集任务包括信息交换的许多阶段,每个阶段都可能失败——但每次都有不同的后果。我们研究了报告的警告消息、返回的记录数量和所需的响应时间,以发现它们之间的关系。我们发现在每个收获回合中大约有一半的收获任务失败,并且失败任务的数量正在缓慢增加。我们开发了一种分析方法,可用于对此类复杂网络系统进行逆向工程,并将故障原因分类为有用的类别。我们的研究结果并没有指出一种新的方法来获取或总结出突破性的建议,而是明确了在不断变化的网络环境中操作的复杂性,并提醒读者一些可能被认为微不足道的事实,实际上并非如此!它们帮助我们更好地了解所涉及的风险,并设计更可靠的程序和改进的方法来密切监测它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Video Based Fall Detection using Features of Motion, Shape and Histogram Evaluating the training transfer of Head-Mounted Display based training for assembly tasks A Taxonomy in Robot-Assisted Training: Current Trends, Needs and Challenges Bicycles and Wheelchairs for Locomotion Control of a Simulated Telerobot Supported by Gaze- and Head-Interaction Experiences with an Assistive System for Manual Assembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1