Changwei Ouyang, Yun Yi, Hanli Wang, Jin Zhou, Tao Tian
{"title":"FineTea:茶道动作的新型细粒度动作识别视频数据集","authors":"Changwei Ouyang, Yun Yi, Hanli Wang, Jin Zhou, Tao Tian","doi":"10.3390/jimaging10090216","DOIUrl":null,"url":null,"abstract":"<p><p>Methods based on deep learning have achieved great success in the field of video action recognition. When these methods are applied to real-world scenarios that require fine-grained analysis of actions, such as being tested on a tea ceremony, limitations may arise. To promote the development of fine-grained action recognition, a fine-grained video action dataset is constructed by collecting videos of tea ceremony actions. This dataset includes 2745 video clips. By using a hierarchical fine-grained action classification approach, these clips are divided into 9 basic action classes and 31 fine-grained action subclasses. To better establish a fine-grained temporal model for tea ceremony actions, a method named TSM-ConvNeXt is proposed that integrates a TSM into the high-performance convolutional neural network ConvNeXt. Compared to a baseline method using ResNet50, the experimental performance of TSM-ConvNeXt is improved by 7.31%. Furthermore, compared with the state-of-the-art methods for action recognition on the FineTea and Diving48 datasets, the proposed approach achieves the best experimental results. The FineTea dataset is publicly available.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11433221/pdf/","citationCount":"0","resultStr":"{\"title\":\"FineTea: A Novel Fine-Grained Action Recognition Video Dataset for Tea Ceremony Actions.\",\"authors\":\"Changwei Ouyang, Yun Yi, Hanli Wang, Jin Zhou, Tao Tian\",\"doi\":\"10.3390/jimaging10090216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Methods based on deep learning have achieved great success in the field of video action recognition. When these methods are applied to real-world scenarios that require fine-grained analysis of actions, such as being tested on a tea ceremony, limitations may arise. To promote the development of fine-grained action recognition, a fine-grained video action dataset is constructed by collecting videos of tea ceremony actions. This dataset includes 2745 video clips. By using a hierarchical fine-grained action classification approach, these clips are divided into 9 basic action classes and 31 fine-grained action subclasses. To better establish a fine-grained temporal model for tea ceremony actions, a method named TSM-ConvNeXt is proposed that integrates a TSM into the high-performance convolutional neural network ConvNeXt. Compared to a baseline method using ResNet50, the experimental performance of TSM-ConvNeXt is improved by 7.31%. Furthermore, compared with the state-of-the-art methods for action recognition on the FineTea and Diving48 datasets, the proposed approach achieves the best experimental results. The FineTea dataset is publicly available.</p>\",\"PeriodicalId\":37035,\"journal\":{\"name\":\"Journal of Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11433221/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/jimaging10090216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging10090216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
FineTea: A Novel Fine-Grained Action Recognition Video Dataset for Tea Ceremony Actions.
Methods based on deep learning have achieved great success in the field of video action recognition. When these methods are applied to real-world scenarios that require fine-grained analysis of actions, such as being tested on a tea ceremony, limitations may arise. To promote the development of fine-grained action recognition, a fine-grained video action dataset is constructed by collecting videos of tea ceremony actions. This dataset includes 2745 video clips. By using a hierarchical fine-grained action classification approach, these clips are divided into 9 basic action classes and 31 fine-grained action subclasses. To better establish a fine-grained temporal model for tea ceremony actions, a method named TSM-ConvNeXt is proposed that integrates a TSM into the high-performance convolutional neural network ConvNeXt. Compared to a baseline method using ResNet50, the experimental performance of TSM-ConvNeXt is improved by 7.31%. Furthermore, compared with the state-of-the-art methods for action recognition on the FineTea and Diving48 datasets, the proposed approach achieves the best experimental results. The FineTea dataset is publicly available.