What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection
{"title":"What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection","authors":"M. Wevers, N. Vriend, Alexander De Bruin","doi":"10.18146/tmg.815","DOIUrl":null,"url":null,"abstract":"In 1962, Dutch celebrity Ria Kuyken was attacked by a circus bear. Cees de Boer captured this moment, for which he was awarded both a World Press Photo and the Silver Camera (Zilveren Camera). Though this photo popularised Fotopersbureau De Boer, which Cees had founded in 1945, the importance of the collection lies in its scale. Approximately 2,000,000 photos taken of about 250,000 events in sixty years, accompanied by extensive metadata. Not only major nationwide events are represented, but also subjects of small scale, human interest, such as the shopkeeper around the corner. Our aim is not only the digitisation and publication of all 2,000,000 photo negatives of Fotopersbureau De Boer but also to explore how artificial intelligence can enrich this collection, benefiting both users of the archive and cultural historians studying historical photographs. One of our efforts focuses on scene detection, a method to detect the ‘scene’ represented in an image (Zhou et al, 2018). We will rely on transfer learning to adapt existing computer vision models to our collection and the needs of our users. Existing models can generate labels with high accuracy, however, these labels are ahistorical and more often than not irrelevant to our collection. We will label subsets of the images via crowdsourcing to train and improve existing models. As such, we can add labels relevant to our collection to the model, which are absent in existing models. In this paper, we will highlight the opportunities and challenges of applying artificial intelligence to a collection of historical photographs.","PeriodicalId":187553,"journal":{"name":"TMG Journal for Media History","volume":"75 17","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TMG Journal for Media History","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18146/tmg.815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In 1962, Dutch celebrity Ria Kuyken was attacked by a circus bear. Cees de Boer captured this moment, for which he was awarded both a World Press Photo and the Silver Camera (Zilveren Camera). Though this photo popularised Fotopersbureau De Boer, which Cees had founded in 1945, the importance of the collection lies in its scale. Approximately 2,000,000 photos taken of about 250,000 events in sixty years, accompanied by extensive metadata. Not only major nationwide events are represented, but also subjects of small scale, human interest, such as the shopkeeper around the corner. Our aim is not only the digitisation and publication of all 2,000,000 photo negatives of Fotopersbureau De Boer but also to explore how artificial intelligence can enrich this collection, benefiting both users of the archive and cultural historians studying historical photographs. One of our efforts focuses on scene detection, a method to detect the ‘scene’ represented in an image (Zhou et al, 2018). We will rely on transfer learning to adapt existing computer vision models to our collection and the needs of our users. Existing models can generate labels with high accuracy, however, these labels are ahistorical and more often than not irrelevant to our collection. We will label subsets of the images via crowdsourcing to train and improve existing models. As such, we can add labels relevant to our collection to the model, which are absent in existing models. In this paper, we will highlight the opportunities and challenges of applying artificial intelligence to a collection of historical photographs.
1962年,荷兰名人Ria Kuyken被马戏团的熊袭击。Cees de Boer捕捉到了这一刻,并因此获得了世界新闻摄影奖和银相机奖(Zilveren Camera)。尽管这张照片使Fotopersbureau De Boer (Cees于1945年创立)流行起来,但该系列的重要性在于它的规模。60年来拍摄了大约200万张照片,记录了大约25万个事件,并附有大量的元数据。不仅有全国性的重大事件,也有小尺度的、人类感兴趣的主题,比如街角的店主。我们的目标不仅是将Fotopersbureau De Boer的所有200万张照片底片数字化和出版,而且还探索人工智能如何丰富这些收藏,使档案用户和研究历史照片的文化历史学家都受益。我们的工作之一集中在场景检测上,这是一种检测图像中表示的“场景”的方法(Zhou等人,2018)。我们将依靠迁移学习来调整现有的计算机视觉模型,以适应我们的集合和用户的需求。现有的模型可以以很高的准确性生成标签,然而,这些标签是非历史的,并且通常与我们的集合无关。我们将通过众包的方式标记图像的子集,以训练和改进现有的模型。因此,我们可以向模型添加与我们的集合相关的标签,这些标签在现有模型中是不存在的。在本文中,我们将重点介绍将人工智能应用于历史照片集合的机遇和挑战。