{"title":"基于DPTNet的稀疏注意语音分离","authors":"Beom Jun Woo, H. Kim, Jeunghun Kim, N. Kim","doi":"10.1109/IC-NIDC54101.2021.9660488","DOIUrl":null,"url":null,"abstract":"This paper presents a sparse attention-based speech separation algorithm separating and generating clean speech from mixed audio containing speech from multiple speakers. Recent development of deep learning has enabled several speech separation models to generate clean speech audios. Especially speech separation models based on transformer show high performance due to their ability to learn long term dependencies compared with other neural network structures. However, as a transformer with self-attention falls short of catching short-term dependencies, we adopt sparse attention structure to the original transformer-based speech separation model. We show that the model with sparse attention outperforms the original full attention method.","PeriodicalId":264468,"journal":{"name":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech Separation Based on DPTNet with Sparse Attention\",\"authors\":\"Beom Jun Woo, H. Kim, Jeunghun Kim, N. Kim\",\"doi\":\"10.1109/IC-NIDC54101.2021.9660488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a sparse attention-based speech separation algorithm separating and generating clean speech from mixed audio containing speech from multiple speakers. Recent development of deep learning has enabled several speech separation models to generate clean speech audios. Especially speech separation models based on transformer show high performance due to their ability to learn long term dependencies compared with other neural network structures. However, as a transformer with self-attention falls short of catching short-term dependencies, we adopt sparse attention structure to the original transformer-based speech separation model. We show that the model with sparse attention outperforms the original full attention method.\",\"PeriodicalId\":264468,\"journal\":{\"name\":\"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC-NIDC54101.2021.9660488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NIDC54101.2021.9660488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Separation Based on DPTNet with Sparse Attention
This paper presents a sparse attention-based speech separation algorithm separating and generating clean speech from mixed audio containing speech from multiple speakers. Recent development of deep learning has enabled several speech separation models to generate clean speech audios. Especially speech separation models based on transformer show high performance due to their ability to learn long term dependencies compared with other neural network structures. However, as a transformer with self-attention falls short of catching short-term dependencies, we adopt sparse attention structure to the original transformer-based speech separation model. We show that the model with sparse attention outperforms the original full attention method.