Sofia Henninger, Maximilian Kellner, Benedikt Rombach, Alexander Reiterer
{"title":"使用预训练基础模型减少训练数据:使用 Segment Anything 模型进行交通标志分割的案例研究。","authors":"Sofia Henninger, Maximilian Kellner, Benedikt Rombach, Alexander Reiterer","doi":"10.3390/jimaging10090220","DOIUrl":null,"url":null,"abstract":"<p><p>The utilization of robust, pre-trained foundation models enables simple adaptation to specific ongoing tasks. In particular, the recently developed Segment Anything Model (SAM) has demonstrated impressive results in the context of semantic segmentation. Recognizing that data collection is generally time-consuming and costly, this research aims to determine whether the use of these foundation models can reduce the need for training data. To assess the models' behavior under conditions of reduced training data, five test datasets for semantic segmentation will be utilized. This study will concentrate on traffic sign segmentation to analyze the results in comparison to Mask R-CNN: the field's leading model. The findings indicate that SAM does not surpass the leading model for this specific task, regardless of the quantity of training data. Nevertheless, a knowledge-distilled student architecture derived from SAM exhibits no reduction in accuracy when trained on data that have been reduced by 95%.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11433296/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reducing Training Data Using Pre-Trained Foundation Models: A Case Study on Traffic Sign Segmentation Using the Segment Anything Model.\",\"authors\":\"Sofia Henninger, Maximilian Kellner, Benedikt Rombach, Alexander Reiterer\",\"doi\":\"10.3390/jimaging10090220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The utilization of robust, pre-trained foundation models enables simple adaptation to specific ongoing tasks. In particular, the recently developed Segment Anything Model (SAM) has demonstrated impressive results in the context of semantic segmentation. Recognizing that data collection is generally time-consuming and costly, this research aims to determine whether the use of these foundation models can reduce the need for training data. To assess the models' behavior under conditions of reduced training data, five test datasets for semantic segmentation will be utilized. This study will concentrate on traffic sign segmentation to analyze the results in comparison to Mask R-CNN: the field's leading model. The findings indicate that SAM does not surpass the leading model for this specific task, regardless of the quantity of training data. Nevertheless, a knowledge-distilled student architecture derived from SAM exhibits no reduction in accuracy when trained on data that have been reduced by 95%.</p>\",\"PeriodicalId\":37035,\"journal\":{\"name\":\"Journal of Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11433296/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/jimaging10090220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging10090220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
Reducing Training Data Using Pre-Trained Foundation Models: A Case Study on Traffic Sign Segmentation Using the Segment Anything Model.
The utilization of robust, pre-trained foundation models enables simple adaptation to specific ongoing tasks. In particular, the recently developed Segment Anything Model (SAM) has demonstrated impressive results in the context of semantic segmentation. Recognizing that data collection is generally time-consuming and costly, this research aims to determine whether the use of these foundation models can reduce the need for training data. To assess the models' behavior under conditions of reduced training data, five test datasets for semantic segmentation will be utilized. This study will concentrate on traffic sign segmentation to analyze the results in comparison to Mask R-CNN: the field's leading model. The findings indicate that SAM does not surpass the leading model for this specific task, regardless of the quantity of training data. Nevertheless, a knowledge-distilled student architecture derived from SAM exhibits no reduction in accuracy when trained on data that have been reduced by 95%.