基于预训练模型的少镜头食物识别

Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications Pub Date : 2022-10-10 DOI:10.1145/3552485.3554939

Yanqi Wu, Xue Song, Jingjing Chen

{"title":"基于预训练模型的少镜头食物识别","authors":"Yanqi Wu, Xue Song, Jingjing Chen","doi":"10.1145/3552485.3554939","DOIUrl":null,"url":null,"abstract":"Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Few-shot Food Recognition with Pre-trained Model\",\"authors\":\"Yanqi Wu, Xue Song, Jingjing Chen\",\"doi\":\"10.1145/3552485.3554939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.\",\"PeriodicalId\":338126,\"journal\":{\"name\":\"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3552485.3554939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3552485.3554939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

由于食物的多样性，食物识别是一项具有挑战性的任务。然而，传统的食物识别网络训练需要大量的标记图像，这既费力又昂贵。在这项工作中，我们的目标是通过利用预训练模型(如CLIP)的知识学习来解决具有挑战性的少量食物识别问题。尽管CLIP在广泛的视觉任务中表现出了显著的零射击能力，但在特定领域的食物识别任务中表现不佳。为了转移CLIP丰富的先验知识，我们探索了一种基于适配器的方法，仅使用少量样本对CLIP进行微调。因此，我们将CLIP的先验知识与从少镜头训练集中提取的新知识有效地结合起来，以获得良好的性能。此外，我们还设计了适当的提示，以便更准确地识别不同菜系的食物。实验表明，我们的方法在两个公共食品数据集(包括VIREO food -172和UECFood-256)上取得了相当有希望的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Few-shot Food Recognition with Pre-trained Model

Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

自引率

0.00%

发文量

期刊最新文献

Multimodal Dish Pairing: Predicting Side Dishes to Serve with a Main Dish Prediction of Mental State from Food Images Learning Sequential Transformation Information of Ingredients for Fine-Grained Cooking Activity Recognition Recipe Recording by Duplicating and Editing Standard Recipe Recipe Recommendation for Balancing Ingredient Preference and Daily Nutrients