Planning a food menu is an essential task in our daily lives. We need to plan a menu by considering various perspectives. To reduce the burden when planning a menu, this study first tackles a novel problem of multimodal dish pairing (MDP), i.e., retrieving suitable side dishes given a query main dish. The key challenge of MDP is to learn human subjectivity, i.e., one-to-many relationships of the main and side dishes. However, in general, web resources only include one-to-one manually created pairs of main and side dishes. To tackle this problem, this study assumes that if side dishes are similar to a manually created side dish, they are also acceptable for the query main dish. We then imitate a one-to-many relationship by computing the similarity of side dishes as side dish scores and assigning them to unknown main and side dish pairs. Based on this score, we train a neural network to learn the suitability of the side dishes through learning-to-rank techniques by fully leveraging the multimodal representations of the dishes. During the experiments, we created a dataset by crawling recipes from an online menu site and evaluated the proposed method based on five criteria: retrieval evaluation, overlapping ingredients, overlapping cooking methods, consistency of the dish styles, and human evaluations. Our experiment results show that the proposed method is superior to the baseline in terms of these five criteria. The results of the qualitative analysis further demonstrates that the proposed method can retrieve side dishes suitable for the main dish.
{"title":"Multimodal Dish Pairing: Predicting Side Dishes to Serve with a Main Dish","authors":"Taichi Nishimura, Katsuhiko Ishiguro, Keita Higuchi, Masaaki Kotera","doi":"10.1145/3552485.3554934","DOIUrl":"https://doi.org/10.1145/3552485.3554934","url":null,"abstract":"Planning a food menu is an essential task in our daily lives. We need to plan a menu by considering various perspectives. To reduce the burden when planning a menu, this study first tackles a novel problem of multimodal dish pairing (MDP), i.e., retrieving suitable side dishes given a query main dish. The key challenge of MDP is to learn human subjectivity, i.e., one-to-many relationships of the main and side dishes. However, in general, web resources only include one-to-one manually created pairs of main and side dishes. To tackle this problem, this study assumes that if side dishes are similar to a manually created side dish, they are also acceptable for the query main dish. We then imitate a one-to-many relationship by computing the similarity of side dishes as side dish scores and assigning them to unknown main and side dish pairs. Based on this score, we train a neural network to learn the suitability of the side dishes through learning-to-rank techniques by fully leveraging the multimodal representations of the dishes. During the experiments, we created a dataset by crawling recipes from an online menu site and evaluated the proposed method based on five criteria: retrieval evaluation, overlapping ingredients, overlapping cooking methods, consistency of the dish styles, and human evaluations. Our experiment results show that the proposed method is superior to the baseline in terms of these five criteria. The results of the qualitative analysis further demonstrates that the proposed method can retrieve side dishes suitable for the main dish.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115307646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.
{"title":"Few-shot Food Recognition with Pre-trained Model","authors":"Yanqi Wu, Xue Song, Jingjing Chen","doi":"10.1145/3552485.3554939","DOIUrl":"https://doi.org/10.1145/3552485.3554939","url":null,"abstract":"Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131397683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sara Ozeki, Masaaki Kotera, Katushiko Ishiguro, Taichi Nishimura, Keita Higuchi
In this work, we propose a recipe recommendation system for daily eating habits based on user preference and nutrient balance. This method prompts user input and allows for the substitution or addition of ingredients while reflecting the user's preferences. The system also considers daily nutrient balance to fill dietary reference intakes such as carbohydrates, protein, and fat. While users select a day's worth of preferred recipes, the system updates the recommendation based on user selection and excess/deficiency predefined nutritional criteria. We run a simulation study to see the performance of the proposed algorithm. With our recipe planning application, we also performed a user study that participants chose a day's worth of recipes with preferred ingredients. The results show that the proposed system helps make better nutrient balance recipes than traditional ingredient-based search. In addition, the participants liked recommendations from the proposed system that improved satisfaction with recipe selection.
{"title":"Recipe Recommendation for Balancing Ingredient Preference and Daily Nutrients","authors":"Sara Ozeki, Masaaki Kotera, Katushiko Ishiguro, Taichi Nishimura, Keita Higuchi","doi":"10.1145/3552485.3554941","DOIUrl":"https://doi.org/10.1145/3552485.3554941","url":null,"abstract":"In this work, we propose a recipe recommendation system for daily eating habits based on user preference and nutrient balance. This method prompts user input and allows for the substitution or addition of ingredients while reflecting the user's preferences. The system also considers daily nutrient balance to fill dietary reference intakes such as carbohydrates, protein, and fat. While users select a day's worth of preferred recipes, the system updates the recommendation based on user selection and excess/deficiency predefined nutritional criteria. We run a simulation study to see the performance of the proposed algorithm. With our recipe planning application, we also performed a user study that participants chose a day's worth of recipes with preferred ingredients. The results show that the proposed system helps make better nutrient balance recipes than traditional ingredient-based search. In addition, the participants liked recommendations from the proposed system that improved satisfaction with recipe selection.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123811897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can we create a global food network? Attempts are being made worldwide to lead people to healthier eating habits. They are not always in the academic field but often on a small scale and privately, in hospitals, nursing homes, schools, and various organizations. They may be collecting and manually analyzing data such as recipes and food records. They are precious data, but in many cases, they are never made public. And in academia, dictionaries, corpora, and knowledge graphs are constructed manually at a high cost, but such knowledge is never shared, and another group continues to generate new knowledge. How can we reduce this wasteful work and allow data and knowledge to be shared and leveraged? There are several issues involved in sharing food data. First, food cultures differ from country to country and region to region. Food data produced in one area rarely works as in another. Food data, especially when linked to medical care, is likely to contain private information and must be anonymized when shared. Anonymizing data without losing its intrinsic value is complex, and knowledge sharing must be abandoned in many cases. In addition, food logging is burdensome. Eating takes place every day, multiple times a day. Recording every meal requires a tremendous amount of effort. However, the impact of a single meal on a person's body is minimal, and it is the long-term record that is important in guiding a person to good health. In this panel discussion, we invite Prof. Stavroula Mougiakakou, General Chair of MADiMa22, a workshop co-located with CEA++22, and Prof. Ramesh Jain, the keynote speaker of CEA++22, to discuss the issues raised above. The Moderator, Prof. Yoko Yamakata will make the panel discussion open to all, and participants from MADiMa22 and CEA++22 are also welcome to join the discussion.
{"title":"CEA++2022 Panel - Toward Building a Global Food Network","authors":"Yoko Yamakata, S. Mougiakakou, Ramesh C. Jain","doi":"10.1145/3552485.3554972","DOIUrl":"https://doi.org/10.1145/3552485.3554972","url":null,"abstract":"How can we create a global food network? Attempts are being made worldwide to lead people to healthier eating habits. They are not always in the academic field but often on a small scale and privately, in hospitals, nursing homes, schools, and various organizations. They may be collecting and manually analyzing data such as recipes and food records. They are precious data, but in many cases, they are never made public. And in academia, dictionaries, corpora, and knowledge graphs are constructed manually at a high cost, but such knowledge is never shared, and another group continues to generate new knowledge. How can we reduce this wasteful work and allow data and knowledge to be shared and leveraged? There are several issues involved in sharing food data. First, food cultures differ from country to country and region to region. Food data produced in one area rarely works as in another. Food data, especially when linked to medical care, is likely to contain private information and must be anonymized when shared. Anonymizing data without losing its intrinsic value is complex, and knowledge sharing must be abandoned in many cases. In addition, food logging is burdensome. Eating takes place every day, multiple times a day. Recording every meal requires a tremendous amount of effort. However, the impact of a single meal on a person's body is minimal, and it is the long-term record that is important in guiding a person to good health. In this panel discussion, we invite Prof. Stavroula Mougiakakou, General Chair of MADiMa22, a workshop co-located with CEA++22, and Prof. Ramesh Jain, the keynote speaker of CEA++22, to discuss the issues raised above. The Moderator, Prof. Yoko Yamakata will make the panel discussion open to all, and participants from MADiMa22 and CEA++22 are also welcome to join the discussion.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126332196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an exploratory research that contains a pre-trained ordering recovery model to obtain correct placement sequences from box lunch images, and a generative adversarial network to composite novel box lunch presentations from single item food and generated layouts. Furthermore, we present Bento800, the first cleanly annotated, high-quality, and standardized dataset for aesthetic box lunch presentation generation and other downstream tasks. Bento800 dataset is available at urlhttps://github.com/Yutong-Zhou-cv/Bento800_Dataset.
{"title":"ABLE: Aesthetic Box Lunch Editing","authors":"Yutong Zhou, N. Shimada","doi":"10.1145/3552485.3554935","DOIUrl":"https://doi.org/10.1145/3552485.3554935","url":null,"abstract":"This paper proposes an exploratory research that contains a pre-trained ordering recovery model to obtain correct placement sequences from box lunch images, and a generative adversarial network to composite novel box lunch presentations from single item food and generated layouts. Furthermore, we present Bento800, the first cleanly annotated, high-quality, and standardized dataset for aesthetic box lunch presentation generation and other downstream tasks. Bento800 dataset is available at urlhttps://github.com/Yutong-Zhou-cv/Bento800_Dataset.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130691268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The best way to ascertain the exact nutritional value of a user's food intake is to have the user record the recipe for that food himself/herself. However, writing a recipe from scratch is tedious and impractical. Therefore, we proposed a method that allows users to write their own recipe in a short time by duplicating and editing a standard recipe. We developed a smartphone application and conducted an experiment in which 19 participants were asked to write their own recipes for about 10 food items each. The results showed that the duplication method took 74% of the time compared to writing a recipe from scratch. The number of editing operations was also reduced to 45%. Future work is to construct a dataset of standard recipes that can be rewritten with little editing cost for any person's recipe.
{"title":"Recipe Recording by Duplicating and Editing Standard Recipe","authors":"Akihisa Ishino, Yoko Yamakata, K. Aizawa","doi":"10.1145/3552485.3554942","DOIUrl":"https://doi.org/10.1145/3552485.3554942","url":null,"abstract":"The best way to ascertain the exact nutritional value of a user's food intake is to have the user record the recipe for that food himself/herself. However, writing a recipe from scratch is tedious and impractical. Therefore, we proposed a method that allows users to write their own recipe in a short time by duplicating and editing a standard recipe. We developed a smartphone application and conducted an experiment in which 19 participants were asked to write their own recipes for about 10 food items each. The results showed that the duplication method took 74% of the time compared to writing a recipe from scratch. The number of editing operations was also reduced to 45%. Future work is to construct a dataset of standard recipes that can be rewritten with little editing cost for any person's recipe.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I face a problem multiple times every day: What am I going to eat, how much, and where? Where can I get enjoyable healthy food? We live in a world where latest geo-spatial information of interest around us is available in the palm of our hand in our smart phone with navigational guidance, if needed. However, the most vital life information related to food remains inaccessible. Food is vital for health and enjoyment by people, society, and planet. However, data, information, and knowledge related to food suffer from inaccessibility, disinformation, and ignorance. A dependable, trusted, accessible, and dynamic source of geo-indexed food data providing culinary, nutritional, and environmental characteristics is essential for guiding wholistic food decisions. A good amount of data and knowledge related to food is already available in different silos. All those silos may be assimilated into a World Food Atlas (WFA) and made available to people to use it for designing food-centered applications, including food recommendation. WFA contains information about location of sources for food ingredients, dishes, recipes, and consumption patterns. All this information may become available through ubiquitous maps. WFA will help in making better decisions for personal, societal, and planetary health. We believe that there is an urgent need and technology is ready to make it happen. Since food varies significantly across even shorter distances and food preparations are dependent on local culture and socio-economic conditions, it is important that local people are involved in creating such an atlas. We have started an open-data World Food Atlas project and are inviting participation of all interested people to contribute. We need people from different area to help populate WFA and use it. The project is in its infancy. We are building a global community that will make this happen. We invite you to participate in this exciting project.
{"title":"Creating a World Food Atlas","authors":"Ramesh C. Jain","doi":"10.1145/3552485.3552517","DOIUrl":"https://doi.org/10.1145/3552485.3552517","url":null,"abstract":"I face a problem multiple times every day: What am I going to eat, how much, and where? Where can I get enjoyable healthy food? We live in a world where latest geo-spatial information of interest around us is available in the palm of our hand in our smart phone with navigational guidance, if needed. However, the most vital life information related to food remains inaccessible. Food is vital for health and enjoyment by people, society, and planet. However, data, information, and knowledge related to food suffer from inaccessibility, disinformation, and ignorance. A dependable, trusted, accessible, and dynamic source of geo-indexed food data providing culinary, nutritional, and environmental characteristics is essential for guiding wholistic food decisions. A good amount of data and knowledge related to food is already available in different silos. All those silos may be assimilated into a World Food Atlas (WFA) and made available to people to use it for designing food-centered applications, including food recommendation. WFA contains information about location of sources for food ingredients, dishes, recipes, and consumption patterns. All this information may become available through ubiquitous maps. WFA will help in making better decisions for personal, societal, and planetary health. We believe that there is an urgent need and technology is ready to make it happen. Since food varies significantly across even shorter distances and food preparations are dependent on local culture and socio-economic conditions, it is important that local people are involved in creating such an atlas. We have started an open-data World Food Atlas project and are inviting participation of all interested people to contribute. We need people from different area to help populate WFA and use it. The project is in its infancy. We are building a global community that will make this happen. We invite you to participate in this exciting project.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124416687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of our research is to recognize the fine-grained cooking activities (e.g., dicing or mincing in cutting) in the egocentric videos from the sequential transformation of ingredients that are processed by the camera-wearer; these types of activities are classified according to the state of ingredients after processing, and we often utilize the same cooking utensils and similar motions in such activities. Due to the above conditions, the recognition of such activities is a challenging task in computer vision and multimedia analysis. To tackle this problem, we need to perceive the sequential state transformation of ingredients precisely. In this research, to realize this, we propose a new GAN-based network whose characteristic points are 1) we crop images around the ingredient as a preprocessing to remove the environmental information, 2) we generate intermediate images from the past and future images to obtain the sequential information in the generator network, 3) the adversarial network is employed as a discriminator to classify whether the input image is generated one or not, and 4) we employ the temporally coherent network to check the temporal smoothness of input images and to predict cooking activities by comparing the original sequential images and the generated ones. To investigate the effectiveness of our proposed method, for the first step, we especially focus on "textitcutting activities ". From the experimental results with our originally prepared dataset, in this paper, we report the effectiveness of our proposed method.
{"title":"Learning Sequential Transformation Information of Ingredients for Fine-Grained Cooking Activity Recognition","authors":"Atsushi Okamoto, Katsufumi Inoue, M. Yoshioka","doi":"10.1145/3552485.3554940","DOIUrl":"https://doi.org/10.1145/3552485.3554940","url":null,"abstract":"The goal of our research is to recognize the fine-grained cooking activities (e.g., dicing or mincing in cutting) in the egocentric videos from the sequential transformation of ingredients that are processed by the camera-wearer; these types of activities are classified according to the state of ingredients after processing, and we often utilize the same cooking utensils and similar motions in such activities. Due to the above conditions, the recognition of such activities is a challenging task in computer vision and multimedia analysis. To tackle this problem, we need to perceive the sequential state transformation of ingredients precisely. In this research, to realize this, we propose a new GAN-based network whose characteristic points are 1) we crop images around the ingredient as a preprocessing to remove the environmental information, 2) we generate intermediate images from the past and future images to obtain the sequential information in the generator network, 3) the adversarial network is employed as a discriminator to classify whether the input image is generated one or not, and 4) we employ the temporally coherent network to check the temporal smoothness of input images and to predict cooking activities by comparing the original sequential images and the generated ones. To investigate the effectiveness of our proposed method, for the first step, we especially focus on \"textitcutting activities \". From the experimental results with our originally prepared dataset, in this paper, we report the effectiveness of our proposed method.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122118908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce a multimedia recipe dataset with annotation of ingredients at every instructional step, named MIAIS (Multimedia recipe dataset with Ingredient Annotation at every Instructional Step). One unique feature of recipe data is that it is usually presented in a sequential and multimedia form. However, few publicly available recipe datasets contain multimedia text-image paired data for every cooking step. Our goal is to construct a recipe dataset that contains sufficient multimedia data and the annotations to them for every cooking step, which is important for many research topics, such as cooking flow graph generation, recipe text generation, and cooking action recognition. MIAIS contains 12,000 recipes; each recipe has 9.13 cooking instruction steps on average, each of which is a tuple of a text description and an image. The text descriptions and images are collected from the NII Cookpad Dataset and Cookpad Image Dataset, respectively. We have already released our annotation data and related information.
在本文中,我们引入了一个在每个教学步骤中都有配料标注的多媒体配方数据集,命名为MIAIS (multimedia recipe dataset with Ingredient annotation at every teaching step)。配方数据的一个独特特性是,它通常以顺序的多媒体形式呈现。然而,很少有公开可用的食谱数据集包含每个烹饪步骤的多媒体文本-图像配对数据。我们的目标是构建一个食谱数据集,该数据集包含足够的多媒体数据以及对每个烹饪步骤的注释,这对于许多研究课题,如烹饪流图生成、食谱文本生成和烹饪动作识别都很重要。MIAIS包含12000个食谱;每个食谱平均有9.13个烹饪指导步骤,每个步骤都是一个由文字描述和图像组成的元组。文本描述和图像分别来自NII Cookpad Dataset和Cookpad Image Dataset。我们已经发布了我们的标注数据和相关信息。
{"title":"MIAIS: A Multimedia Recipe Dataset with Ingredient Annotation at Each Instructional Step","authors":"Yixin Zhang, Yoko Yamakata, Keishi Tajima","doi":"10.1145/3552485.3554938","DOIUrl":"https://doi.org/10.1145/3552485.3554938","url":null,"abstract":"In this paper, we introduce a multimedia recipe dataset with annotation of ingredients at every instructional step, named MIAIS (Multimedia recipe dataset with Ingredient Annotation at every Instructional Step). One unique feature of recipe data is that it is usually presented in a sequential and multimedia form. However, few publicly available recipe datasets contain multimedia text-image paired data for every cooking step. Our goal is to construct a recipe dataset that contains sufficient multimedia data and the annotations to them for every cooking step, which is important for many research topics, such as cooking flow graph generation, recipe text generation, and cooking action recognition. MIAIS contains 12,000 recipes; each recipe has 9.13 cooking instruction steps on average, each of which is a tuple of a text description and an image. The text descriptions and images are collected from the NII Cookpad Dataset and Cookpad Image Dataset, respectively. We have already released our annotation data and related information.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recording high-quality textual recipes is effective for documenting food culture. However, comparing the quality of various recipes is difficult because recipe quality might depend on a variety of description styles and dishes. Therefore, we constructed the following "Comparative Recipes" dataset. First, each of the 64 writers described five recipes after watching five home cooking videos. A total of 318 recipes were created. For each dish (video), there were 15.9 recipes on average, and each recipe was described by a different writer. Next, 335 recipe readers evaluated the quality (i.e., the reproducibility and completeness) of each recipe. A morphological analysis that used this dataset revealed that the amount of description per cooking step affects recipe quality. Furthermore, the effects of cooking procedures being integrated into cooking steps on recipe quality tended to be dependent on the reader's skill. The results suggest a need for description support that appropriately integrates cooking procedures into cooking steps according to the skills and preferences of the reader.
{"title":"\"Comparable Recipes\": A Construction and Analysis of a Dataset of Recipes Described by Different People for the Same Dish","authors":"Rina Kagawa, Rei Miyata, Yoko Yamakata","doi":"10.1145/3552485.3554936","DOIUrl":"https://doi.org/10.1145/3552485.3554936","url":null,"abstract":"Recording high-quality textual recipes is effective for documenting food culture. However, comparing the quality of various recipes is difficult because recipe quality might depend on a variety of description styles and dishes. Therefore, we constructed the following \"Comparative Recipes\" dataset. First, each of the 64 writers described five recipes after watching five home cooking videos. A total of 318 recipes were created. For each dish (video), there were 15.9 recipes on average, and each recipe was described by a different writer. Next, 335 recipe readers evaluated the quality (i.e., the reproducibility and completeness) of each recipe. A morphological analysis that used this dataset revealed that the amount of description per cooking step affects recipe quality. Furthermore, the effects of cooking procedures being integrated into cooking steps on recipe quality tended to be dependent on the reader's skill. The results suggest a need for description support that appropriately integrates cooking procedures into cooking steps according to the skills and preferences of the reader.","PeriodicalId":338126,"journal":{"name":"Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications","volume":"33 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133784574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}