{"title":"Modaldrop: Modality-Aware Regularization for Temporal-Spectral Fusion in Human Activity Recognition","authors":"Xin Zeng, Yiqiang Chen, Benfeng Xu, Tengxiang Zhang","doi":"10.1109/ICASSP49357.2023.10095880","DOIUrl":null,"url":null,"abstract":"Although most of existing works for sensor-based Human Activity Recognition rely on the temporal view, we argue that the spectral view also provides complementary prior and accordingly benchmark a standard multi-view framework with extensive experiments to demonstrate its consistent superiority over single-view opponents. We then delve into the intrinsic mechanism of the multi-view representation fusion, and propose ModalDrop as a novel modality-aware regularization method to learn and exploit representations of both views effectively. We demonstrate its advantage over existing representation fusion alternatives with comprehensive experiments and ablations. The improvements are consistent for various settings and are orthogonal with different backbones. We also discuss its potential application for other related tasks regarding representation or modality fusion. The source code is available on https://github.com/studyzx/ModalDrop.git.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10095880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Although most of existing works for sensor-based Human Activity Recognition rely on the temporal view, we argue that the spectral view also provides complementary prior and accordingly benchmark a standard multi-view framework with extensive experiments to demonstrate its consistent superiority over single-view opponents. We then delve into the intrinsic mechanism of the multi-view representation fusion, and propose ModalDrop as a novel modality-aware regularization method to learn and exploit representations of both views effectively. We demonstrate its advantage over existing representation fusion alternatives with comprehensive experiments and ablations. The improvements are consistent for various settings and are orthogonal with different backbones. We also discuss its potential application for other related tasks regarding representation or modality fusion. The source code is available on https://github.com/studyzx/ModalDrop.git.