档案的尺度:现代早期通信网络分析的Ro-bustness

Q1 Arts and Humanities Journal of Cultural Analytics Pub Date : 2021-07-21 DOI:10.22148/001C.25943
Y. Ryan, S. Ahnert
{"title":"档案的尺度:现代早期通信网络分析的Ro-bustness","authors":"Y. Ryan, S. Ahnert","doi":"10.22148/001C.25943","DOIUrl":null,"url":null,"abstract":"Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.","PeriodicalId":33005,"journal":{"name":"Journal of Cultural Analytics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence\",\"authors\":\"Y. Ryan, S. Ahnert\",\"doi\":\"10.22148/001C.25943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.\",\"PeriodicalId\":33005,\"journal\":{\"name\":\"Journal of Cultural Analytics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cultural Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22148/001C.25943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cultural Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22148/001C.25943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 3

摘要

历史对应的网络分析是解决历史研究问题的一种富有成效的方法,在过去十年中越来越多地用于历史研究。与定量人文研究的许多领域一样,研究结果的可靠性经常受到质疑,因为这种方法需要“硬数据”作为输入,但几乎不可避免地使用了部分或缺失记录的数据集。其他使用网络分析的学科进行了鲁棒性实验,旨在测试数据丢失或错误对其结果的影响。为了测试这些丢失的数据如何影响我们自己的研究领域,我们进行了许多实验,旨在模拟历史通信数据中常见的各种丢失的影响,包括随机文档丢失、丢失年份以及消歧和重复数据删除过程中的错误。结果表明,大多数网络中心性度量在很大一部分数据(60%或更多)被删除之前保持鲁棒性。一些指标显示出稳健性的线性变化,而另一些指标则保持高位,然后急剧下降。只有一个,传递性(局部聚类系数)在整个过程中受到显著影响。我们测试了一系列数据丢失场景(随机单个字母、手稿信件的对开本、目录和整个年份)和一系列常用的网络指标。此外,我们测试了文献中更复杂的网络分析结果的稳健性,这些结果结合了几个网络指标来突出网络中的个体,并发现即使有50%的随机字母丢失,相同类型的个体也可能被突出显示。本文附带了一个使用Shiny构建的web应用程序,它将为用户上传的网络数据集计算健壮性度量。我们的结论是,研究人员使用类似的历史通信数据集,可能能够考虑网络分析结果在大多数情况下是稳健的,而不是假设缺失的数据会导致非常不同的发现或结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence
Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cultural Analytics
Journal of Cultural Analytics Arts and Humanities-Literature and Literary Theory
CiteScore
2.90
自引率
0.00%
发文量
9
审稿时长
10 weeks
期刊最新文献
Soviet View of the World. Exploring Long-Term Visual Patterns in “Novosti dnia” Newsreel Journal (1945-1992) A Digital Archaeology of Early Hispanic Film Culture: Film Magazines and the Male Fan Reader A Digital Trail of Rupture. The German Film Exile 1933-1945 in the Data of Günter Peter Straschek Approaching a National Film History through Data. Network Analysis in German Film History Digital Film Historiography: Challenges of/and Interdisciplinarity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1