{"title":"基于聚类的双流时态特征聚合用于少量动作识别","authors":"Long Deng;Ao Li;Bingxin Zhou;Yongxin Ge","doi":"10.1109/LSP.2024.3456670","DOIUrl":null,"url":null,"abstract":"The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically, \n<bold>(1)</b>\n limited training data could impede the exploration of temporal action relations, and \n<bold>(2)</b>\n precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition\",\"authors\":\"Long Deng;Ao Li;Bingxin Zhou;Yongxin Ge\",\"doi\":\"10.1109/LSP.2024.3456670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically, \\n<bold>(1)</b>\\n limited training data could impede the exploration of temporal action relations, and \\n<bold>(2)</b>\\n precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669816/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669816/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition
The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically,
(1)
limited training data could impede the exploration of temporal action relations, and
(2)
precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.