Creating order from the mess: web archive derivative datasets and notebooks

IF 0.8 3区 社会学 0 HUMANITIES, MULTIDISCIPLINARY Archives and Records-The Journal of the Archives and Records Association Pub Date : 2022-09-02 DOI:10.1080/23257962.2022.2100336
Nick Ruest, Samantha Fritz, Ian Milligan
{"title":"Creating order from the mess: web archive derivative datasets and notebooks","authors":"Nick Ruest, Samantha Fritz, Ian Milligan","doi":"10.1080/23257962.2022.2100336","DOIUrl":null,"url":null,"abstract":"ABSTRACT For a quarter-century, memory institutions have been preserving web-based content. These web archives have been collected and stored in ARC and WARC (W/ARC) file formats and will form a basis for contemporary histories. Yet, these formats present significant challenges to researchers who wish to access and use web archival data. This is primarily due to the nature of collecting, storing, and providing access to these multifaceted digital objects. In other words, web archives are messy. Applying traditional archival methods of description to digital-born collections is complicated due to issues of provenance, original order, and scale. However, we believe that archival description offers a practical starting point for thinking about access. This paper argues a robust finding aid must extend beyond basic collection-level description to allow for more meaningful interactions with web archives. As such, we propose a reimagining of a traditional finding-aid model into a three-level mode of description to include computational methods, the generation of derivative datasets, and interactive code-rich notebooks. These three factors combine to ultimately contribute to the expanded access and use of web archives.","PeriodicalId":42972,"journal":{"name":"Archives and Records-The Journal of the Archives and Records Association","volume":"43 1","pages":"316 - 331"},"PeriodicalIF":0.8000,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives and Records-The Journal of the Archives and Records Association","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23257962.2022.2100336","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT For a quarter-century, memory institutions have been preserving web-based content. These web archives have been collected and stored in ARC and WARC (W/ARC) file formats and will form a basis for contemporary histories. Yet, these formats present significant challenges to researchers who wish to access and use web archival data. This is primarily due to the nature of collecting, storing, and providing access to these multifaceted digital objects. In other words, web archives are messy. Applying traditional archival methods of description to digital-born collections is complicated due to issues of provenance, original order, and scale. However, we believe that archival description offers a practical starting point for thinking about access. This paper argues a robust finding aid must extend beyond basic collection-level description to allow for more meaningful interactions with web archives. As such, we propose a reimagining of a traditional finding-aid model into a three-level mode of description to include computational methods, the generation of derivative datasets, and interactive code-rich notebooks. These three factors combine to ultimately contribute to the expanded access and use of web archives.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从混乱中创造秩序:网络存档衍生数据集和笔记本
摘要四分之一个世纪以来,记忆机构一直在保存基于网络的内容。这些网络档案以ARC和WARC(W/ARC)文件格式收集和存储,将成为当代历史的基础。然而,这些格式给希望访问和使用网络档案数据的研究人员带来了重大挑战。这主要是由于收集、存储和提供对这些多方面数字对象的访问的性质。换句话说,网络档案是混乱的。由于来源、原始顺序和规模的问题,将传统的档案描述方法应用于数字藏品是复杂的。然而,我们认为,档案描述为思考访问提供了一个实用的起点。本文认为,强大的查找辅助工具必须扩展到基本的收藏级描述之外,才能与网络档案进行更有意义的交互。因此,我们建议将传统的搜索辅助模型重新构想为三级描述模式,包括计算方法、衍生数据集的生成和交互式代码丰富的笔记本。这三个因素结合在一起,最终有助于扩大网络档案的访问和使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
45
期刊最新文献
The Remaking of Archival Values The Remaking of Archival Values , by Victoria Hoyle, Oxford, Routledge, 2023, xv + 225pp., £120 (hardback) ISBN: 978-0-367-47867-4 Exhibiting the Archive: Space, Encounter, and Experience Defining ‘proper research’: privileged access, local authority archives and the academic researcher The Register of the Goldsmiths’ Company: Deeds and Documents, c. 1190 to c. 1666, 3 Volumes The handbook of archival practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1