Hanane Ariouat, Youcef Sklab, M. Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, E. Chenin
{"title":"Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques","authors":"Hanane Ariouat, Youcef Sklab, M. Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, E. Chenin","doi":"10.3897/biss.7.112161","DOIUrl":null,"url":null,"abstract":"Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants.\n Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens. \n In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belonging to the plant specimen, and removes the other pixels that are part of non-plant elements considered as noise. A removed pixel is set to zero (black). Fig. 1 illustrates the complete masking pipeline in two main stages, object detection and image segmentation.\n In the first stage, we manually annotated the images using bounding boxes in a dataset of 950 images. We identified (Fig. 2) the visual elements considered to be noise (e.g., scale-bar, barcode, stamp, text box, color pallet, envelope). Then we trained the model to automatically remove the noise elements. We divided the dataset into 80% training, 10% validation and 10% test set. We ultimately achieved a precision score of 98.2%, which is a 3% improvement from the baseline. Next, the results of this stage were used as input for image segmentation, which aimed to generate the final mask. We blacken the pixels covered by the detected noise elements, then we used HSV (Hue Saturation Value) color segmentation to select only the pixels with values in a range that corresponds mostly to a plant color. Finally, we applied the morphological opening operation that removes noise and separates objects; and the closing operation that fills gaps, as described in Sunil Bhutada et al. (2022) to remove the remaining noise. The output here is a generated mask that retains only the pixels that belong to the plant. Unlike other proposed approaches, which focus essentially on leaves and stems, our approach covers all the plant organs (Fig. 3). \n Our approach removes the background noise from herbarium scans and extracts clean plant images. It is an important step before using these images in different deep learning models. However, the quality of the extractions varies depending on the quality of the scans, the condition of the specimens, and the paper used. For example, extractions made from samples where the color of the plant is different from the color of the background were more accurate than extractions made from samples where the color of the plant and background are close. To overcome this limitation, we aim to use some of the obtained extractions to create a training dataset, followed by the development and the training of a generative deep learning model to generate masks that delimit plants.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants.
Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens.
In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belonging to the plant specimen, and removes the other pixels that are part of non-plant elements considered as noise. A removed pixel is set to zero (black). Fig. 1 illustrates the complete masking pipeline in two main stages, object detection and image segmentation.
In the first stage, we manually annotated the images using bounding boxes in a dataset of 950 images. We identified (Fig. 2) the visual elements considered to be noise (e.g., scale-bar, barcode, stamp, text box, color pallet, envelope). Then we trained the model to automatically remove the noise elements. We divided the dataset into 80% training, 10% validation and 10% test set. We ultimately achieved a precision score of 98.2%, which is a 3% improvement from the baseline. Next, the results of this stage were used as input for image segmentation, which aimed to generate the final mask. We blacken the pixels covered by the detected noise elements, then we used HSV (Hue Saturation Value) color segmentation to select only the pixels with values in a range that corresponds mostly to a plant color. Finally, we applied the morphological opening operation that removes noise and separates objects; and the closing operation that fills gaps, as described in Sunil Bhutada et al. (2022) to remove the remaining noise. The output here is a generated mask that retains only the pixels that belong to the plant. Unlike other proposed approaches, which focus essentially on leaves and stems, our approach covers all the plant organs (Fig. 3).
Our approach removes the background noise from herbarium scans and extracts clean plant images. It is an important step before using these images in different deep learning models. However, the quality of the extractions varies depending on the quality of the scans, the condition of the specimens, and the paper used. For example, extractions made from samples where the color of the plant is different from the color of the background were more accurate than extractions made from samples where the color of the plant and background are close. To overcome this limitation, we aim to use some of the obtained extractions to create a training dataset, followed by the development and the training of a generative deep learning model to generate masks that delimit plants.
植物标本室标本扫描是原始数据的宝贵来源。由于对植物标本馆的探索可以帮助人们了解生物多样性面临的严重威胁,因此科学界对植物标本馆的收藏越来越感兴趣。通过分析扫描标本图像获得的数据,可以回答诸如植物如何应对气候变化、不同物种如何应对生物和非生物影响,或者物种在生态系统中扮演什么角色等重要问题。然而,利用如此大的集合是具有挑战性的,并且需要自动处理。一个有希望的解决方案是使用基于计算机的处理技术,如深度学习(DL)。但是,植物标本室标本可能难以处理和分析,因为它们包含几种视觉噪声,包括信息标签、比尺、调色板、包含种子或其他器官的信封、收集特定的条形码、邮票和放置在安装片上的其他注释。此外,由于多种原因,放置标本的纸张会随着时间的推移而退化,纸张的颜色通常会变暗,在某些情况下,接近植物的颜色。神经网络模型非常适合于植物标本馆标本的分析,同时抽象了这种视觉噪声的存在。然而,在某些情况下,模型可以专注于这些元素,这最终会导致在分析不存在这些视觉元素的新数据时产生不好的泛化(White et al. 2020)。在将样本扫描用于模型训练和测试之前,去除样本扫描中的噪声以提高其性能是很重要的。研究使用了基本的裁剪技术(Younis et al. 2018),但它们不能保证从裁剪的图像中去除视觉噪声。例如,标签经常被放在扫描的随机位置,导致裁剪的图像仍然包含噪声。White等人(2020)使用Otsu二值化方法,然后进行手动后处理和模糊步骤来调整分割过程中应该分配给黑色的像素。Hussein et al.(2020)使用图像标记器应用,然后使用中值滤波方法来降低噪声。然而,White et al.(2020)和Hussein et al.(2020)只考虑了两个器官:茎和叶。Triki等人(2022)使用了基于多边形的深度学习对象检测算法。但是,除了费力和困难之外,这种方法在完全识别标本时也不能得到很好的结果。在这项工作中,我们的目标是创建与原始图像相同分辨率的干净的高分辨率掩模提取。这些面具可以被其他模型用于各种目的,例如区分不同的植物器官。在这里,我们将目标检测和图像分割技术结合起来,使用扫描的植物标本数据集。我们提出了一种算法,该算法可以识别和保留属于植物标本的像素,并去除作为噪声的非植物元素的其他像素。被移除的像素被设置为零(黑色)。图1给出了完整的掩蔽流水线,主要分为两个阶段:目标检测和图像分割。在第一阶段,我们在950张图像的数据集中使用边界框手动注释图像。我们确定了(图2)被认为是噪声的视觉元素(例如,比例尺、条形码、邮票、文本框、色盘、信封)。然后训练模型自动去除噪声元素。我们将数据集分为80%的训练集,10%的验证集和10%的测试集。我们最终获得了98.2%的精度分数,比基线提高了3%。接下来,将这一阶段的结果作为图像分割的输入,目的是生成最终的掩码。我们将被检测到的噪声元素覆盖的像素变黑,然后我们使用HSV(色调饱和度值)颜色分割来选择只有在一个范围内的值与植物颜色相对应的像素。最后,应用形态学打开操作去除噪声,分离目标;如Sunil Bhutada et al.(2022)所述,填补间隙的关闭操作,以消除剩余的噪声。这里的输出是一个生成的掩码,它只保留属于植物的像素。与其他主要关注叶和茎的方法不同,我们的方法涵盖了所有植物器官(图3)。我们的方法消除了植物标本扫描的背景噪声,提取了干净的植物图像。这是在不同深度学习模型中使用这些图像之前的重要一步。然而,提取的质量取决于扫描的质量、标本的状况和使用的纸张。