Self-supervised visual representation learning on food images

Andrew W. Peng, Jiangpeng He, Fengqing Zhu
{"title":"Self-supervised visual representation learning on food images","authors":"Andrew W. Peng, Jiangpeng He, Fengqing Zhu","doi":"10.2352/ei.2023.35.7.image-269","DOIUrl":null,"url":null,"abstract":"Food image classification is the groundwork for image-based dietary assessment, which is the process of monitoring what kinds of food and how much energy is consumed using captured food or eating scene images. Existing deep learning based methods learn the visual representation for food classification based on human annotation of each food image. However, most food images captured from real life are obtained without labels, requiring human annotation to train deep learning based methods. This approach is not feasible for real world deployment due to high costs. To make use of the vast amount of unlabeled images, many existing works focus on unsupervised or self-supervised learning to learn the visual representation directly from unlabeled data. However, none of these existing works focuses on food images, which is more challenging than general objects due to its high inter-class similarity and intra-class variance. In this paper, we focus on two items: the comparison of existing models and the development of an effective self-supervised learning model for food image classification. Specifically, we first compare the performance of existing state-of-the-art self-supervised learning models, including SimSiam, SimCLR, SwAV, BYOL, MoCo, and Rotation Pretext Task on food images. The experiments are conducted on the Food-101 dataset, which contains 101 different classes of foods with 1,000 images in each class. Next, we analyze the unique features of each model and compare their performance on food images to identify the key factors in each model that can help improve the accuracy. Finally, we propose a new model for unsupervised visual representation learning on food images for the classification task.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IS&T International Symposium on Electronic Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2352/ei.2023.35.7.image-269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Food image classification is the groundwork for image-based dietary assessment, which is the process of monitoring what kinds of food and how much energy is consumed using captured food or eating scene images. Existing deep learning based methods learn the visual representation for food classification based on human annotation of each food image. However, most food images captured from real life are obtained without labels, requiring human annotation to train deep learning based methods. This approach is not feasible for real world deployment due to high costs. To make use of the vast amount of unlabeled images, many existing works focus on unsupervised or self-supervised learning to learn the visual representation directly from unlabeled data. However, none of these existing works focuses on food images, which is more challenging than general objects due to its high inter-class similarity and intra-class variance. In this paper, we focus on two items: the comparison of existing models and the development of an effective self-supervised learning model for food image classification. Specifically, we first compare the performance of existing state-of-the-art self-supervised learning models, including SimSiam, SimCLR, SwAV, BYOL, MoCo, and Rotation Pretext Task on food images. The experiments are conducted on the Food-101 dataset, which contains 101 different classes of foods with 1,000 images in each class. Next, we analyze the unique features of each model and compare their performance on food images to identify the key factors in each model that can help improve the accuracy. Finally, we propose a new model for unsupervised visual representation learning on food images for the classification task.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
食物图像的自监督视觉表征学习
食物图像分类是基于图像的饮食评估的基础,它是使用捕获的食物或进食场景图像来监测食物种类和消耗多少能量的过程。现有的基于深度学习的方法是基于人类对每个食物图像的注释来学习食物分类的视觉表示。然而,大多数从现实生活中捕获的食物图像都是没有标签的,需要人工注释来训练基于深度学习的方法。由于成本高,这种方法在实际部署中是不可行的。为了利用大量的未标记图像,许多现有的工作都集中在无监督或自监督学习上,直接从未标记的数据中学习视觉表征。然而,这些现有的作品都没有关注食物图像,因为食物图像具有较高的类间相似性和类内方差,比一般对象更具挑战性。在本文中,我们重点研究了两个项目:现有模型的比较和一种有效的食品图像分类自监督学习模型的开发。具体来说,我们首先比较了现有的最先进的自监督学习模型,包括SimSiam、SimCLR、SwAV、BYOL、MoCo和轮换借口任务在食物图像上的性能。实验是在Food-101数据集上进行的,该数据集包含101个不同类别的食物,每个类别有1000张图像。接下来,我们分析了每个模型的独特特征,并比较了它们在食物图像上的表现,以确定每个模型中有助于提高准确率的关键因素。最后,我们提出了一种新的基于食物图像的无监督视觉表征学习模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Egocentric Boundaries on Distinguishing Colliding and Non-Colliding Pedestrians while Walking in a Virtual Environment. Optical flow for autonomous driving: Applications, challenges and improvements Improving the performance of web-streaming by super-resolution upscaling techniques Self-supervised visual representation learning on food images Conditional synthetic food image generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1