Riyansha SinghIIT Kanpur, India, Parinita NemaIISER Bhopal, India, Vinod K KurmiIISER Bhopal, India
{"title":"利用对比表征在音频分类中实现稳健的少量类增量学习","authors":"Riyansha SinghIIT Kanpur, India, Parinita NemaIISER Bhopal, India, Vinod K KurmiIISER Bhopal, India","doi":"arxiv-2407.19265","DOIUrl":null,"url":null,"abstract":"In machine learning applications, gradual data ingress is common, especially\nin audio processing where incremental learning is vital for real-time\nanalytics. Few-shot class-incremental learning addresses challenges arising\nfrom limited incoming data. Existing methods often integrate additional\ntrainable components or rely on a fixed embedding extractor post-training on\nbase sessions to mitigate concerns related to catastrophic forgetting and the\ndangers of model overfitting. However, using cross-entropy loss alone during\nbase session training is suboptimal for audio data. To address this, we propose\nincorporating supervised contrastive learning to refine the representation\nspace, enhancing discriminative power and leading to better generalization\nsince it facilitates seamless integration of incremental classes, upon arrival.\nExperimental results on NSynth and LibriSpeech datasets with 100 classes, as\nwell as ESC dataset with 50 and 10 classes, demonstrate state-of-the-art\nperformance.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Robust Few-shot Class Incremental Learning in Audio Classification using Contrastive Representation\",\"authors\":\"Riyansha SinghIIT Kanpur, India, Parinita NemaIISER Bhopal, India, Vinod K KurmiIISER Bhopal, India\",\"doi\":\"arxiv-2407.19265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In machine learning applications, gradual data ingress is common, especially\\nin audio processing where incremental learning is vital for real-time\\nanalytics. Few-shot class-incremental learning addresses challenges arising\\nfrom limited incoming data. Existing methods often integrate additional\\ntrainable components or rely on a fixed embedding extractor post-training on\\nbase sessions to mitigate concerns related to catastrophic forgetting and the\\ndangers of model overfitting. However, using cross-entropy loss alone during\\nbase session training is suboptimal for audio data. To address this, we propose\\nincorporating supervised contrastive learning to refine the representation\\nspace, enhancing discriminative power and leading to better generalization\\nsince it facilitates seamless integration of incremental classes, upon arrival.\\nExperimental results on NSynth and LibriSpeech datasets with 100 classes, as\\nwell as ESC dataset with 50 and 10 classes, demonstrate state-of-the-art\\nperformance.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.19265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Robust Few-shot Class Incremental Learning in Audio Classification using Contrastive Representation
In machine learning applications, gradual data ingress is common, especially
in audio processing where incremental learning is vital for real-time
analytics. Few-shot class-incremental learning addresses challenges arising
from limited incoming data. Existing methods often integrate additional
trainable components or rely on a fixed embedding extractor post-training on
base sessions to mitigate concerns related to catastrophic forgetting and the
dangers of model overfitting. However, using cross-entropy loss alone during
base session training is suboptimal for audio data. To address this, we propose
incorporating supervised contrastive learning to refine the representation
space, enhancing discriminative power and leading to better generalization
since it facilitates seamless integration of incremental classes, upon arrival.
Experimental results on NSynth and LibriSpeech datasets with 100 classes, as
well as ESC dataset with 50 and 10 classes, demonstrate state-of-the-art
performance.