Challenges and Opportunities for Digital History

I. Gregory
{"title":"Challenges and Opportunities for Digital History","authors":"I. Gregory","doi":"10.3389/fdigh.2014.00001","DOIUrl":null,"url":null,"abstract":"The challenge for digital historians is deceptively simple: it is to do good history that combines the computer’s ability to search and summarize, with the researcher’s ability to interpret and argue. This involves both developing an understanding of how to use digital sources appropriately, and more importantly, using digital sources and methods to deliver new scholarship that enhances our understanding of the past. There are plenty of sources available; the challenge is to make use of them to deliver on their potential. There have been false dawns for digital history, or “history and computing,” in the past (Boonstra et al. 2004). Until very recently, computers were primarily associated with performing calculations on numbers. This has resulted in them becoming fundamental tools in fields such as economic history, historical demography and, through the use of geographical information systems (GIS)1, historical geography. These are, however, relatively small fields within the discipline as a whole and much of the work that has been done in them has taken place outside of History departments in, for example, Economics, Sociology, and Geography. As most historians work with texts, it is hardly surprising that this style of computing has made little impact on the wider discipline. Within the last few years, however, there has been a fundamental shift in computing in which, put simply, computers have moved from being number crunching machines to become an information technology where much of the information that they contain is in textual form. This has been associated with the creation of truly massive amounts of digital textual content. This ranges from social media and the internet, to private sector digitization projects such as Google Books and the Gale/Cengage collections, to the more limited investment from the academic and charitable sectors (Thomas and Johnson 2013). Thus, computers are now inextricably concerned with texts – exactly the type of source that is central to the study of history. As a consequence, many historians have become “digital historians” almost without realizing it through making use of the vast number of sources that are now available from their desktop. So is everything in the garden that is digital history currently rosy? The answer, judging by work such as Hitchcock (2013) and the responses to it (Knights 2013; Prescott 2013), seems to be a resounding no. Many criticisms are centered on the digital sources themselves, whose quality is lower than that might be hoped. Digitizing a document is usually a two-stage process: first a digital image of the document is created as a bitmap, then the textual content is encoded as machine readable text. The two are then often brought together such that a user can type a search term, this is located in the text, and then the user can be shown the appropriate image of the page. The first of the two stages is relatively simple using a scanner or camera and, if done properly, only results in relatively minor abstractions from the original as the result is a facsimile copy. The second stage, however, is hugely problematic involving either the text being manually typed, or optical character recognition (OCR) software being used to automatically identify letters from the bitmap image. Both of these are slow, expensive, and errorprone. OCR tends to be used on largescale projects: it is faster and cheaper but tends to result in far more errors. Whatever approach is used, checking the results is very difficult. Common approaches involve carefully typing up“gold standard”samples of parts of the source and comparing these with bulk-entered material to give a percentage of words or letters that have errors. Understanding what the consequences of these scores mean in practice is difficult. Even without error, if the text is removed from the page scans then they are heavily abstracted from the original and much potentially useful information is lost. Once created, digital sources are often interrogated using techniques that are not properly understood but are nevertheless used uncritically. The classic example that combines both the data capture and uncritical use problems is typing a keyword search into a web interface, which returns a list of hits sorted by “relevance.” As Hitchcock (2013) points out, most historians using digital sources do this without having any idea of the implications either of the data capture that created the digital copy of the source, and thus whether the search will miss words as a result of spelling variations derived from digitization errors, or of how the search engines decides what is – and, more importantly, is not – “relevant.” While using search engines may be problematic, in reality they are the only digital tool that most historians use, indeed there is a lack of widely used techniques that can be used to interrogate, summarize, and understand the large volumes of material that are available. So what do digital historians need to do? The answer, I would argue, is to remember that they are first and foremost historians and that historians fundamentally are in the business of taking complex, incomplete sources that are full of biases and errors, and interpreting them critically to develop an argument that answers a research question. Digital sources do not change this;","PeriodicalId":227954,"journal":{"name":"Frontiers Digit. Humanit.","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers Digit. Humanit.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdigh.2014.00001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

The challenge for digital historians is deceptively simple: it is to do good history that combines the computer’s ability to search and summarize, with the researcher’s ability to interpret and argue. This involves both developing an understanding of how to use digital sources appropriately, and more importantly, using digital sources and methods to deliver new scholarship that enhances our understanding of the past. There are plenty of sources available; the challenge is to make use of them to deliver on their potential. There have been false dawns for digital history, or “history and computing,” in the past (Boonstra et al. 2004). Until very recently, computers were primarily associated with performing calculations on numbers. This has resulted in them becoming fundamental tools in fields such as economic history, historical demography and, through the use of geographical information systems (GIS)1, historical geography. These are, however, relatively small fields within the discipline as a whole and much of the work that has been done in them has taken place outside of History departments in, for example, Economics, Sociology, and Geography. As most historians work with texts, it is hardly surprising that this style of computing has made little impact on the wider discipline. Within the last few years, however, there has been a fundamental shift in computing in which, put simply, computers have moved from being number crunching machines to become an information technology where much of the information that they contain is in textual form. This has been associated with the creation of truly massive amounts of digital textual content. This ranges from social media and the internet, to private sector digitization projects such as Google Books and the Gale/Cengage collections, to the more limited investment from the academic and charitable sectors (Thomas and Johnson 2013). Thus, computers are now inextricably concerned with texts – exactly the type of source that is central to the study of history. As a consequence, many historians have become “digital historians” almost without realizing it through making use of the vast number of sources that are now available from their desktop. So is everything in the garden that is digital history currently rosy? The answer, judging by work such as Hitchcock (2013) and the responses to it (Knights 2013; Prescott 2013), seems to be a resounding no. Many criticisms are centered on the digital sources themselves, whose quality is lower than that might be hoped. Digitizing a document is usually a two-stage process: first a digital image of the document is created as a bitmap, then the textual content is encoded as machine readable text. The two are then often brought together such that a user can type a search term, this is located in the text, and then the user can be shown the appropriate image of the page. The first of the two stages is relatively simple using a scanner or camera and, if done properly, only results in relatively minor abstractions from the original as the result is a facsimile copy. The second stage, however, is hugely problematic involving either the text being manually typed, or optical character recognition (OCR) software being used to automatically identify letters from the bitmap image. Both of these are slow, expensive, and errorprone. OCR tends to be used on largescale projects: it is faster and cheaper but tends to result in far more errors. Whatever approach is used, checking the results is very difficult. Common approaches involve carefully typing up“gold standard”samples of parts of the source and comparing these with bulk-entered material to give a percentage of words or letters that have errors. Understanding what the consequences of these scores mean in practice is difficult. Even without error, if the text is removed from the page scans then they are heavily abstracted from the original and much potentially useful information is lost. Once created, digital sources are often interrogated using techniques that are not properly understood but are nevertheless used uncritically. The classic example that combines both the data capture and uncritical use problems is typing a keyword search into a web interface, which returns a list of hits sorted by “relevance.” As Hitchcock (2013) points out, most historians using digital sources do this without having any idea of the implications either of the data capture that created the digital copy of the source, and thus whether the search will miss words as a result of spelling variations derived from digitization errors, or of how the search engines decides what is – and, more importantly, is not – “relevant.” While using search engines may be problematic, in reality they are the only digital tool that most historians use, indeed there is a lack of widely used techniques that can be used to interrogate, summarize, and understand the large volumes of material that are available. So what do digital historians need to do? The answer, I would argue, is to remember that they are first and foremost historians and that historians fundamentally are in the business of taking complex, incomplete sources that are full of biases and errors, and interpreting them critically to develop an argument that answers a research question. Digital sources do not change this;
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数字化历史的挑战与机遇
那么,数字历史学家需要做些什么呢?我认为,答案是要记住,他们首先是历史学家,历史学家的工作基本上是采用复杂的、不完整的、充满偏见和错误的资料,并对它们进行批判性的解释,以形成一个能回答研究问题的论点。数字资源不会改变这一点;
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ancient City, Universal Growth? Exploring Urban Expansion and Economic Development on Rome's Eastern Periphery A New Kind of Relevance for Archaeology Modeling the Rise of the City: Early Urban Networks in Southern Italy Trajectories to Low-Density Settlements Past and Present: Paradox and Outcomes Corrigendum: Large-Scale Urban Prototyping for Responsive Cities: A Conceptual Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1