比较研究:在PlantNet-300k图像数据集中评估类平衡对变压器性能的影响

Biodiversity Information Science and Standards Pub Date : 2023-09-21 DOI:10.3897/biss.7.113057

José Chavarría Madriz, Maria Mora-Cross, William Ulate

{"title":"比较研究:在PlantNet-300k图像数据集中评估类平衡对变压器性能的影响","authors":"José Chavarría Madriz, Maria Mora-Cross, William Ulate","doi":"10.3897/biss.7.113057","DOIUrl":null,"url":null,"abstract":"Image-based identification of plant specimens plays a crucial role in various fields such as agriculture, ecology, and biodiversity conservation. The growing interest in deep learning has led to remarkable advancements in image classification techniques, particularly with the utilization of convolutional neural networks (CNNs). Since 2015, in the context of the PlantCLEF (Conference and Labs of the Evaluation Forum) challenge (Joly et al. 2015), deep learning models, specifically CNNs, have consistently achieved the most impressive results in this field (Carranza-Rojas 2018). However, recent developments have introduced transformer-based models, such as ViT (Vision Transformer) (Dosovitskiy et al. 2020) and CvT (Convolutional vision Transformer) (Wu et al. 2021), as a promising alternative for image classification tasks. Transformers offer unique advantages such as capturing global context and handling long-range dependencies (Vaswani et al. 2017), which make them suitable for complex recognition tasks like plant identification. In this study, we focus on the image classification task using the PlantNet-300k dataset (Garcin et al. 2021a). The dataset consists of a large collection of 306,146 plant images representing 1,081 distinct species. These images were selected from the Pl@ntNet citizen observatory database. The dataset has two prominent characteristics that pose challenges for classification. First, there is a significant class imbalance, meaning that a small subset of species dominates the majority of the images. This imbalance creates bias and affects the accuracy of classification models. Second, many species exhibit visual similarities, making it tough, even for experts, to accurately identify them. These characteristics are referred to by the dataset authors as long-tailed distribution and high intrinsic ambiguity, respectively (Garcin et al. 2021b). In order to address the inherent challenges of the PlantNet-300k dataset, we employed a two-fold approach. Firstly, we leveraged transformer-based models to tackle the dataset's intrinsic ambiguity and effectively capture the complex visual patterns present in plant images. Secondly, we focused on mitigating the class imbalance issue through various data preprocessing techniques, specifically class balancing methods. By implementing these techniques, we aimed to ensure fair representation of all plant species in order to improve the overall performance of image classification models. Our objective is to assess the effects of data preprocessing techniques, specifically class balancing, on the classification performance of the PlantNet-300k dataset. By exploring different preprocessing methods, we addressed the class imbalance issue and through precise evaluation, conducted a comparison of the performance of transformer-based models with and without class balancing techniques. Through these efforts, our ultimate goal is to assert if these techniques allow us to achieve more accurate and reliable classification results, particularly for underrepresented species in the dataset. In our experiment, we compared the performance of two transformer-based models, ViT and CvT, using two versions of the PlantNet-300k dataset: one with class balancing and the other without class balancing. This setup results in a total of four sets of metrics for evaluation. To assess the classification performance, we utilized a wide range of commonly used metrics including recall, precision, accuracy, AUC (Area Under the Curve), ROC (Receiver Operating Characteristic), and others. These metrics provide insights into each models' ability to correctly classify plant species, identify false positives and negatives, measure overall accuracy, and assess the models' discriminatory power. By conducting this comparative study, we seek to contribute to the advancement of plant identification research by providing empirical evidence of the benefits and effectiveness of class balancing techniques in improving the performance of transformer-based models on the PlantNet-300k dataset and any other similar ones.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Study: Evaluating the effects of class balancing on transformer performance in the PlantNet-300k image dataset\",\"authors\":\"José Chavarría Madriz, Maria Mora-Cross, William Ulate\",\"doi\":\"10.3897/biss.7.113057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-based identification of plant specimens plays a crucial role in various fields such as agriculture, ecology, and biodiversity conservation. The growing interest in deep learning has led to remarkable advancements in image classification techniques, particularly with the utilization of convolutional neural networks (CNNs). Since 2015, in the context of the PlantCLEF (Conference and Labs of the Evaluation Forum) challenge (Joly et al. 2015), deep learning models, specifically CNNs, have consistently achieved the most impressive results in this field (Carranza-Rojas 2018). However, recent developments have introduced transformer-based models, such as ViT (Vision Transformer) (Dosovitskiy et al. 2020) and CvT (Convolutional vision Transformer) (Wu et al. 2021), as a promising alternative for image classification tasks. Transformers offer unique advantages such as capturing global context and handling long-range dependencies (Vaswani et al. 2017), which make them suitable for complex recognition tasks like plant identification. In this study, we focus on the image classification task using the PlantNet-300k dataset (Garcin et al. 2021a). The dataset consists of a large collection of 306,146 plant images representing 1,081 distinct species. These images were selected from the Pl@ntNet citizen observatory database. The dataset has two prominent characteristics that pose challenges for classification. First, there is a significant class imbalance, meaning that a small subset of species dominates the majority of the images. This imbalance creates bias and affects the accuracy of classification models. Second, many species exhibit visual similarities, making it tough, even for experts, to accurately identify them. These characteristics are referred to by the dataset authors as long-tailed distribution and high intrinsic ambiguity, respectively (Garcin et al. 2021b). In order to address the inherent challenges of the PlantNet-300k dataset, we employed a two-fold approach. Firstly, we leveraged transformer-based models to tackle the dataset's intrinsic ambiguity and effectively capture the complex visual patterns present in plant images. Secondly, we focused on mitigating the class imbalance issue through various data preprocessing techniques, specifically class balancing methods. By implementing these techniques, we aimed to ensure fair representation of all plant species in order to improve the overall performance of image classification models. Our objective is to assess the effects of data preprocessing techniques, specifically class balancing, on the classification performance of the PlantNet-300k dataset. By exploring different preprocessing methods, we addressed the class imbalance issue and through precise evaluation, conducted a comparison of the performance of transformer-based models with and without class balancing techniques. Through these efforts, our ultimate goal is to assert if these techniques allow us to achieve more accurate and reliable classification results, particularly for underrepresented species in the dataset. In our experiment, we compared the performance of two transformer-based models, ViT and CvT, using two versions of the PlantNet-300k dataset: one with class balancing and the other without class balancing. This setup results in a total of four sets of metrics for evaluation. To assess the classification performance, we utilized a wide range of commonly used metrics including recall, precision, accuracy, AUC (Area Under the Curve), ROC (Receiver Operating Characteristic), and others. These metrics provide insights into each models' ability to correctly classify plant species, identify false positives and negatives, measure overall accuracy, and assess the models' discriminatory power. By conducting this comparative study, we seek to contribute to the advancement of plant identification research by providing empirical evidence of the benefits and effectiveness of class balancing techniques in improving the performance of transformer-based models on the PlantNet-300k dataset and any other similar ones.\",\"PeriodicalId\":9011,\"journal\":{\"name\":\"Biodiversity Information Science and Standards\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodiversity Information Science and Standards\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/biss.7.113057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.113057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于图像的植物标本识别在农业、生态、生物多样性保护等领域发挥着至关重要的作用。对深度学习日益增长的兴趣导致了图像分类技术的显著进步，特别是卷积神经网络(cnn)的使用。自2015年以来，在PlantCLEF(评估论坛的会议和实验室)挑战的背景下(Joly等人，2015)，深度学习模型，特别是cnn，在这一领域一直取得了最令人印象深刻的结果(Carranza-Rojas 2018)。然而，最近的发展已经引入了基于变压器的模型，如ViT(视觉变压器)(Dosovitskiy等人，2020)和CvT(卷积视觉变压器)(Wu等人，2021)，作为图像分类任务的有希望的替代方案。变压器具有独特的优势，例如捕获全局上下文和处理远程依赖关系(Vaswani et al. 2017)，这使得它们适用于复杂的识别任务，如植物识别。在本研究中，我们重点研究了使用PlantNet-300k数据集的图像分类任务(Garcin et al. 2021a)。该数据集包含306146张植物图像，代表1081个不同的物种。这些图像选自Pl@ntNet公民天文台数据库。数据集有两个突出的特征，这给分类带来了挑战。首先，存在明显的类不平衡，这意味着一小部分物种支配了大多数图像。这种不平衡产生了偏差，影响了分类模型的准确性。其次，许多物种表现出视觉上的相似性，这使得即使是专家也很难准确识别它们。这些特征分别被数据集作者称为长尾分布和高内在模糊性(Garcin et al. 2021b)。为了解决PlantNet-300k数据集的固有挑战，我们采用了双重方法。首先，我们利用基于变压器的模型来解决数据集的内在模糊性，并有效捕获植物图像中存在的复杂视觉模式。其次，我们着重于通过各种数据预处理技术，特别是类平衡方法来缓解类不平衡问题。通过实施这些技术，我们的目标是确保所有植物物种的公平表示，以提高图像分类模型的整体性能。我们的目标是评估数据预处理技术，特别是类平衡对PlantNet-300k数据集分类性能的影响。通过探索不同的预处理方法，我们解决了类不平衡问题，并通过精确的评估，对有和没有类平衡技术的基于变压器的模型的性能进行了比较。通过这些努力，我们的最终目标是断言这些技术是否允许我们获得更准确和可靠的分类结果，特别是对于数据集中代表性不足的物种。在我们的实验中，我们使用两个版本的PlantNet-300k数据集比较了两种基于变压器的模型(ViT和CvT)的性能:一个有类平衡，另一个没有类平衡。这种设置总共产生了四组用于评估的度量。为了评估分类性能，我们使用了广泛的常用指标，包括召回率、精密度、准确度、曲线下面积(AUC)、ROC(接收者工作特征)等。这些指标提供了对每个模型正确分类植物物种、识别假阳性和阴性、测量总体准确性以及评估模型歧视性能力的见解。通过进行这项比较研究，我们寻求通过提供类平衡技术在改善基于变压器的模型在PlantNet-300k数据集和任何其他类似数据集上的性能方面的优势和有效性的经验证据，为植物识别研究的进步做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparative Study: Evaluating the effects of class balancing on transformer performance in the PlantNet-300k image dataset

Image-based identification of plant specimens plays a crucial role in various fields such as agriculture, ecology, and biodiversity conservation. The growing interest in deep learning has led to remarkable advancements in image classification techniques, particularly with the utilization of convolutional neural networks (CNNs). Since 2015, in the context of the PlantCLEF (Conference and Labs of the Evaluation Forum) challenge (Joly et al. 2015), deep learning models, specifically CNNs, have consistently achieved the most impressive results in this field (Carranza-Rojas 2018). However, recent developments have introduced transformer-based models, such as ViT (Vision Transformer) (Dosovitskiy et al. 2020) and CvT (Convolutional vision Transformer) (Wu et al. 2021), as a promising alternative for image classification tasks. Transformers offer unique advantages such as capturing global context and handling long-range dependencies (Vaswani et al. 2017), which make them suitable for complex recognition tasks like plant identification. In this study, we focus on the image classification task using the PlantNet-300k dataset (Garcin et al. 2021a). The dataset consists of a large collection of 306,146 plant images representing 1,081 distinct species. These images were selected from the Pl@ntNet citizen observatory database. The dataset has two prominent characteristics that pose challenges for classification. First, there is a significant class imbalance, meaning that a small subset of species dominates the majority of the images. This imbalance creates bias and affects the accuracy of classification models. Second, many species exhibit visual similarities, making it tough, even for experts, to accurately identify them. These characteristics are referred to by the dataset authors as long-tailed distribution and high intrinsic ambiguity, respectively (Garcin et al. 2021b). In order to address the inherent challenges of the PlantNet-300k dataset, we employed a two-fold approach. Firstly, we leveraged transformer-based models to tackle the dataset's intrinsic ambiguity and effectively capture the complex visual patterns present in plant images. Secondly, we focused on mitigating the class imbalance issue through various data preprocessing techniques, specifically class balancing methods. By implementing these techniques, we aimed to ensure fair representation of all plant species in order to improve the overall performance of image classification models. Our objective is to assess the effects of data preprocessing techniques, specifically class balancing, on the classification performance of the PlantNet-300k dataset. By exploring different preprocessing methods, we addressed the class imbalance issue and through precise evaluation, conducted a comparison of the performance of transformer-based models with and without class balancing techniques. Through these efforts, our ultimate goal is to assert if these techniques allow us to achieve more accurate and reliable classification results, particularly for underrepresented species in the dataset. In our experiment, we compared the performance of two transformer-based models, ViT and CvT, using two versions of the PlantNet-300k dataset: one with class balancing and the other without class balancing. This setup results in a total of four sets of metrics for evaluation. To assess the classification performance, we utilized a wide range of commonly used metrics including recall, precision, accuracy, AUC (Area Under the Curve), ROC (Receiver Operating Characteristic), and others. These metrics provide insights into each models' ability to correctly classify plant species, identify false positives and negatives, measure overall accuracy, and assess the models' discriminatory power. By conducting this comparative study, we seek to contribute to the advancement of plant identification research by providing empirical evidence of the benefits and effectiveness of class balancing techniques in improving the performance of transformer-based models on the PlantNet-300k dataset and any other similar ones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodiversity Information Science and Standards

自引率

0.00%

发文量