The Page Image: Towards a Visual History of Digital Documents

IF 0.5 Q1 HISTORY Book History Pub Date : 2020-10-22 DOI:10.1353/bh.2020.0010
Andrew Piper, Chad Wellmon, M. Cheriet
{"title":"The Page Image: Towards a Visual History of Digital Documents","authors":"Andrew Piper, Chad Wellmon, M. Cheriet","doi":"10.1353/bh.2020.0010","DOIUrl":null,"url":null,"abstract":"France, to convene the first ever conference on “document analysis and recognition.”1 The meeting brought together researchers from all over the world who for roughly the previous decade had been slowly changing the paradigm through which they approached the problem of the machinic understanding of the digitized page. Instead of thinking in terms of “characters” and “recognition,” which underlay the long-standing field of Optical Character Recognition (OCR), they were gradually moving towards a more global and formal understanding of the page image as a whole. Researchers in the field of Document Image Analysis, or DIA as it came to be known, discarded the common assumption that the letter or the text was the ultimate referent of the bibliographic page. They focused instead on the heterogenous visual qualities of the page, or what they termed “the page image.” “Document image analysis,” writes George Nagy in a survey of twenty years of research in the field, is the “theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.”2 DIA researchers turned the page image into an analytical object. In moving away from a text-centric understanding of the page, research in Document Image Analysis offers an important new way of thinking about the bibliographic page that is different from what has traditionally been the case in computational approaches to studying culture, but that has deep roots in the fields of book history, bibliography, and textual studies. Whether in the guise of “natural language processing” (NLP), “optical character recognition” (OCR), or “text mining,” computational approaches to pages have remained heavily influenced by a text-centric mentality, using the page image as an (often imperfect) means to an end, an object to be passed through rather than studied as something potentially meaningful in itself. At the same time, the fast-growing field of “image analytics,” which ranges from facial detection to the analysis of newspaper illustrations, has largely maintained the text-image divide that has long dominated the study The Page Image  Towards a Visual History of Digital Documents","PeriodicalId":43753,"journal":{"name":"Book History","volume":"23 1","pages":"365 - 397"},"PeriodicalIF":0.5000,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1353/bh.2020.0010","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Book History","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/bh.2020.0010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY","Score":null,"Total":0}
引用次数: 4

Abstract

France, to convene the first ever conference on “document analysis and recognition.”1 The meeting brought together researchers from all over the world who for roughly the previous decade had been slowly changing the paradigm through which they approached the problem of the machinic understanding of the digitized page. Instead of thinking in terms of “characters” and “recognition,” which underlay the long-standing field of Optical Character Recognition (OCR), they were gradually moving towards a more global and formal understanding of the page image as a whole. Researchers in the field of Document Image Analysis, or DIA as it came to be known, discarded the common assumption that the letter or the text was the ultimate referent of the bibliographic page. They focused instead on the heterogenous visual qualities of the page, or what they termed “the page image.” “Document image analysis,” writes George Nagy in a survey of twenty years of research in the field, is the “theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.”2 DIA researchers turned the page image into an analytical object. In moving away from a text-centric understanding of the page, research in Document Image Analysis offers an important new way of thinking about the bibliographic page that is different from what has traditionally been the case in computational approaches to studying culture, but that has deep roots in the fields of book history, bibliography, and textual studies. Whether in the guise of “natural language processing” (NLP), “optical character recognition” (OCR), or “text mining,” computational approaches to pages have remained heavily influenced by a text-centric mentality, using the page image as an (often imperfect) means to an end, an object to be passed through rather than studied as something potentially meaningful in itself. At the same time, the fast-growing field of “image analytics,” which ranges from facial detection to the analysis of newspaper illustrations, has largely maintained the text-image divide that has long dominated the study The Page Image  Towards a Visual History of Digital Documents
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
页面图像:走向数字文档的视觉历史
法国,召开了有史以来第一次关于“文档分析和识别”的会议。1该会议汇集了来自世界各地的研究人员,他们在大约过去的十年里一直在慢慢改变他们处理数字化页面机械理解问题的范式。他们不再考虑长期存在的光学字符识别(OCR)领域的“字符”和“识别”,而是逐渐走向对整个页面图像的更全面、更正式的理解。文献图像分析(DIA)领域的研究人员摒弃了信件或文本是书目页面的最终参考对象的普遍假设。相反,他们关注的是页面的异质视觉质量,或他们所称的“页面图像”。乔治·纳吉在对该领域20年研究的调查中写道,“文档图像分析”是“恢复从纸张扫描或计算机生成的数字图像的符号结构的理论和实践”。“2国防情报局的研究人员将页面图像变成了一个分析对象。在摆脱以文本为中心的页面理解的过程中,文献图像分析的研究为思考书目页面提供了一种重要的新方式,这种方式不同于传统的文化研究计算方法,但它深深植根于图书史、书目和考据学领域。无论是以“自然语言处理”(NLP)、“光学字符识别”(OCR)还是“文本挖掘”为幌子,页面的计算方法仍然受到以文本为中心的心态的严重影响,将页面图像作为一种(通常不完美的)手段来达到目的,一个需要传递的对象,而不是作为本身可能有意义的东西来研究。与此同时,快速增长的“图像分析”领域,从面部检测到报纸插图分析,在很大程度上保持了长期主导研究的文本-图像鸿沟 走向数字文献的视觉史
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Book History
Book History HISTORY-
CiteScore
0.60
自引率
0.00%
发文量
10
期刊最新文献
Gender, Commerce, and the Restoration Book Trade: Mapping the Bookscape of Hannah Wolley's The Ladies Directory (1661) Ni Kinidi/Making Book: Textual Mobility in 1830s Cape Palmas, West Africa Subscription Publishing and the Eighteenth-Century Origins of Indian Print Culture Margins of Error: Edmund Blunden Annotates Good-bye to All That "Adapted to the Soldier's Pocket": Military Discipline, Religious Publishing, and the Power of Print Format during the US Civil War
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1