{"title":"Improved Generalization from Limiting Attention in a Transformer for Sleep Stage Classification","authors":"Dongyoung Kim, Dong-Kyu Kim, Jeong-Gun Lee","doi":"10.1109/ICEIC61013.2024.10457194","DOIUrl":null,"url":null,"abstract":"A transformer architecture has been employed effectively on many tasks such as natural language processing and vision recognition. The most important and general requirement of utilizing the transformer-based architecture is that the model has to be trained on a large-scale dataset before it can be fine-tuned for downstream tasks. However, in our experiments, we figure out that the transformer-based architecture has better generalization capability to extract features from data samples in sleep stage classification than CNN-based architectures, even with a small-scale dataset without any extra pretraining step. In this paper, we show the strength of the transformer architecture with regard to generalization capability over the conventional CNN architecture in sleep stage classification tasks specifically using a small-scale dataset.","PeriodicalId":518726,"journal":{"name":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"34 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC61013.2024.10457194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A transformer architecture has been employed effectively on many tasks such as natural language processing and vision recognition. The most important and general requirement of utilizing the transformer-based architecture is that the model has to be trained on a large-scale dataset before it can be fine-tuned for downstream tasks. However, in our experiments, we figure out that the transformer-based architecture has better generalization capability to extract features from data samples in sleep stage classification than CNN-based architectures, even with a small-scale dataset without any extra pretraining step. In this paper, we show the strength of the transformer architecture with regard to generalization capability over the conventional CNN architecture in sleep stage classification tasks specifically using a small-scale dataset.