Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642211
Tian Zheng, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu
Attention mechanism plays a significant role in the current encoder-decoder framework of image captioning. Nevertheless, many attention mechanisms only fuse textual feature and image feature once, failing to adequately integrate the feature between context and image. Furthermore, many image captioning networks based on scene graphs only consider the node information but ignore the structure, which is insufficient in grasping the spatial object relationship. To address the above problems, we propose structural attention and increased global attention. Two attentions select critical image features from image detail and global image. The increased global attention, focusing on global image features, enhances integration between text and image via fusing detailed image features into global attention. To better describe the relationship among image objects, our network allows for both the node information by content attention and the structure information by structural attention. Structural attention computes the similarity between the structure information of scene graph and local attention, building the image objects relationship differing from content attention. We evaluate the performance of our image captioning network in MS COCO and Visual Genome datasets. The results of the experiments show that our method achieves superior performance compared with the existing methods.
{"title":"Graph Structural Attention and Increased Global Attention for Image Captioning","authors":"Tian Zheng, Wenhua Qian, Rencan Nie, Jinde Cao, Dan Xu","doi":"10.1109/ICICIP53388.2021.9642211","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642211","url":null,"abstract":"Attention mechanism plays a significant role in the current encoder-decoder framework of image captioning. Nevertheless, many attention mechanisms only fuse textual feature and image feature once, failing to adequately integrate the feature between context and image. Furthermore, many image captioning networks based on scene graphs only consider the node information but ignore the structure, which is insufficient in grasping the spatial object relationship. To address the above problems, we propose structural attention and increased global attention. Two attentions select critical image features from image detail and global image. The increased global attention, focusing on global image features, enhances integration between text and image via fusing detailed image features into global attention. To better describe the relationship among image objects, our network allows for both the node information by content attention and the structure information by structural attention. Structural attention computes the similarity between the structure information of scene graph and local attention, building the image objects relationship differing from content attention. We evaluate the performance of our image captioning network in MS COCO and Visual Genome datasets. The results of the experiments show that our method achieves superior performance compared with the existing methods.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125108092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642213
Xiao Xiao, Bolin Liao, Qiuqing Long, Yongjun He, J. Li, Luyang Han
Traditional extreme learning machine (ELM) requires a large number of hidden layer neurons in its applications, and the ability to process high-dimensional big data samples is weak. In response to the above problems, this paper proposes an improved extreme learning machine algorithm based on deep learning. This algorithm combines the double pseudo-inverse extreme learning machine (DPELM) algorithm, which has high classification accuracy and simple network structure, with the denoising autoencoder (DAE) which can extract more essential data features. Among them, DAE is used to extract the features of the data that needs to be recognized, and the DPELM mainly plays as a classifier to quickly classify and recognize the extracted features. Experimental results show that in the recognition of handwritten digits, the double pseudo-inverse extreme learning machine based on denoising autoencoder (DAE-DPELM) algorithm needs only a small number of hidden layer neurons. In addition, compared with the traditional ELM algorithm and DAE-ELM algorithm, DAE-DPELM algorithm has a higher classification accuracy.
{"title":"Improved Extreme Learning Machine Based on Deep Learning and Its Application in Handwritten Digits Recognition","authors":"Xiao Xiao, Bolin Liao, Qiuqing Long, Yongjun He, J. Li, Luyang Han","doi":"10.1109/ICICIP53388.2021.9642213","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642213","url":null,"abstract":"Traditional extreme learning machine (ELM) requires a large number of hidden layer neurons in its applications, and the ability to process high-dimensional big data samples is weak. In response to the above problems, this paper proposes an improved extreme learning machine algorithm based on deep learning. This algorithm combines the double pseudo-inverse extreme learning machine (DPELM) algorithm, which has high classification accuracy and simple network structure, with the denoising autoencoder (DAE) which can extract more essential data features. Among them, DAE is used to extract the features of the data that needs to be recognized, and the DPELM mainly plays as a classifier to quickly classify and recognize the extracted features. Experimental results show that in the recognition of handwritten digits, the double pseudo-inverse extreme learning machine based on denoising autoencoder (DAE-DPELM) algorithm needs only a small number of hidden layer neurons. In addition, compared with the traditional ELM algorithm and DAE-ELM algorithm, DAE-DPELM algorithm has a higher classification accuracy.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123764473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642184
Yumeng Cai, Guoyong Cai, Jin Cai
Action recognition methods are mostly based on a 3-Dimensional (3D) Convolution Network which have some limitations in practice, e.g. redundant parameters, big memory consumed and low performance. In this paper, a new convolution-free model called action-transformer is proposed to address the mentioned problems. The model proposed is mainly composed of three modules: spatial-temporal transformation module, hybrid feature attention module, and residual-transformer module. The spatial-temporal transformation module is designed to map the split short video into spatial and temporal features. The hybrid feature attention module is designed to extract the fine-grained features from the spatial and temporal features and produce the hybrid features. The residual-transformer module is designed with the combination of the attention, feed-forward network, and the residual mechanism to extract local and global features from the hybrid features. The model is tested on the HMDB51 and UCFIOI data set, and the result shows that the memory, the parameters used by the proposed model are less than those models mentioned in the literature, and it achieves better performance too.
{"title":"Action-Transformer for Action Recognition in Short Videos","authors":"Yumeng Cai, Guoyong Cai, Jin Cai","doi":"10.1109/ICICIP53388.2021.9642184","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642184","url":null,"abstract":"Action recognition methods are mostly based on a 3-Dimensional (3D) Convolution Network which have some limitations in practice, e.g. redundant parameters, big memory consumed and low performance. In this paper, a new convolution-free model called action-transformer is proposed to address the mentioned problems. The model proposed is mainly composed of three modules: spatial-temporal transformation module, hybrid feature attention module, and residual-transformer module. The spatial-temporal transformation module is designed to map the split short video into spatial and temporal features. The hybrid feature attention module is designed to extract the fine-grained features from the spatial and temporal features and produce the hybrid features. The residual-transformer module is designed with the combination of the attention, feed-forward network, and the residual mechanism to extract local and global features from the hybrid features. The model is tested on the HMDB51 and UCFIOI data set, and the result shows that the memory, the parameters used by the proposed model are less than those models mentioned in the literature, and it achieves better performance too.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133805967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642175
Zhaolun Li, Xiao-peng Luo
For autonomous underwater vehicles (AUVs), autonomous navigation in an unknown underwater environment is still a difficult problem. In recent years, people have proposed some machine learning-based methods to solve this problem, but the existing methods still cannot meet the complex and changeable underwater environment. This paper conducts technical research on the path planning of autonomous underwater vehicles, combines deep learning and reinforcement learning, uses WL interpolation surface to model the seabed, and proposes a path planning model for autonomous underwater vehicles based on deep reinforcement learning. And train the path planning model in the simulation environment, and finally achieve the goal of path planning for the underwater robot in the complex and changeable underwater environment.
{"title":"Autonomous underwater vehicles (AUVs) path planning based on Deep Reinforcement Learning","authors":"Zhaolun Li, Xiao-peng Luo","doi":"10.1109/ICICIP53388.2021.9642175","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642175","url":null,"abstract":"For autonomous underwater vehicles (AUVs), autonomous navigation in an unknown underwater environment is still a difficult problem. In recent years, people have proposed some machine learning-based methods to solve this problem, but the existing methods still cannot meet the complex and changeable underwater environment. This paper conducts technical research on the path planning of autonomous underwater vehicles, combines deep learning and reinforcement learning, uses WL interpolation surface to model the seabed, and proposes a path planning model for autonomous underwater vehicles based on deep reinforcement learning. And train the path planning model in the simulation environment, and finally achieve the goal of path planning for the underwater robot in the complex and changeable underwater environment.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128358018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642224
Lu Wang, Xiaoyun Zhang, Huidong Wang, Chuanzheng Bai
K-means(KM) clustering algorithm is well known for its simplicity and efficiency. However, the clustering effect is greatly influenced by the selection of initial centers. To solve this problem, one of the improved algorithms is global k-means (GKM) which performs the clustering process in an incremental manner. This incremental manner makes GKM get rid of the influence of initial points selection and reach the global optimum or near global optimum results. However, GKM requires high computational cost. Therefore, an improved global k-means (IGKM) algorithm is proposed using a new guarantee reduction to reduce the computational load of GKM. Centroid theorem is introduced to reduce the computational time further. Simulation results on 14 datasets demonstrate that our IGKM algorithm can obtain better clustering results and requires less running time.
{"title":"An improved global k-means clustering algorithm","authors":"Lu Wang, Xiaoyun Zhang, Huidong Wang, Chuanzheng Bai","doi":"10.1109/ICICIP53388.2021.9642224","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642224","url":null,"abstract":"K-means(KM) clustering algorithm is well known for its simplicity and efficiency. However, the clustering effect is greatly influenced by the selection of initial centers. To solve this problem, one of the improved algorithms is global k-means (GKM) which performs the clustering process in an incremental manner. This incremental manner makes GKM get rid of the influence of initial points selection and reach the global optimum or near global optimum results. However, GKM requires high computational cost. Therefore, an improved global k-means (IGKM) algorithm is proposed using a new guarantee reduction to reduce the computational load of GKM. Centroid theorem is introduced to reduce the computational time further. Simulation results on 14 datasets demonstrate that our IGKM algorithm can obtain better clustering results and requires less running time.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642180
Liya Ma, N. Zhang, Kuang-I Shu, Xitao Zou
With making up for the deficiency of the constraint representation capability of hashing codes for high-dimensional data, the quantization method has been found to generally perform better in cross-modal similarity retrieval research. However, in current quantization approaches, the codebook, as the most critical basis for quantization, is still in a passive status and detached from the learning framework. To improve the initiative of codebook, we propose a semantic-consistent deep quantization (SCDQ), which is the first scheme to integrate quantization into deep network learning in an end-to-end fashion. Specifically, two classifiers following the deep representation learning networks are formulated to produce the class-wise abstract patterns with the help of label alignment. Meanwhile, our approach learns a collaborative codebook for both modalities, which embeds bimodality semantic consistent information in codewords and bridges the relationship between the patterns in classifiers and codewords in codebook. By designing a novel algorithm architecture and codebook update strategy, SCDQ enables effective and efficient cross-modal retrieval in an asymmetric way. Extensive experiments on two benchmark datasets demonstrate that SCDQ yields optimal cross-modal retrieval performance and outperforms several state of-the-art cross-modal retrieval methods.
{"title":"Semantic-Consistent Deep Quantization for Cross-modal Retrieval","authors":"Liya Ma, N. Zhang, Kuang-I Shu, Xitao Zou","doi":"10.1109/ICICIP53388.2021.9642180","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642180","url":null,"abstract":"With making up for the deficiency of the constraint representation capability of hashing codes for high-dimensional data, the quantization method has been found to generally perform better in cross-modal similarity retrieval research. However, in current quantization approaches, the codebook, as the most critical basis for quantization, is still in a passive status and detached from the learning framework. To improve the initiative of codebook, we propose a semantic-consistent deep quantization (SCDQ), which is the first scheme to integrate quantization into deep network learning in an end-to-end fashion. Specifically, two classifiers following the deep representation learning networks are formulated to produce the class-wise abstract patterns with the help of label alignment. Meanwhile, our approach learns a collaborative codebook for both modalities, which embeds bimodality semantic consistent information in codewords and bridges the relationship between the patterns in classifiers and codewords in codebook. By designing a novel algorithm architecture and codebook update strategy, SCDQ enables effective and efficient cross-modal retrieval in an asymmetric way. Extensive experiments on two benchmark datasets demonstrate that SCDQ yields optimal cross-modal retrieval performance and outperforms several state of-the-art cross-modal retrieval methods.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134212842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642201
Jianzhen Xiao, Canhui Chen, Yunong Zhang
In this paper, we first propose a Zhang neural dynamics (ZND) model for the generalized Sinkhorn scaling of time-varying matrix. Specifically, by using the dimensional reduction technique, a continuous-time ZND model of time-varying matrix scaling is proposed and analyzed. In addition, the corresponding theoretical proofs are given, which prove the theoretical validity of the proposed ZND model. Moreover, two numerical experiments containing a square case and a rectangle case are also conducted. Numerical experiments and results substantiate the effectiveness and accuracy of the proposed ZND model.
{"title":"Continuous ZND (Zhang Neural Dynamics) Model for Generalized Sinkhorn Scaling of Time-Varying Matrix","authors":"Jianzhen Xiao, Canhui Chen, Yunong Zhang","doi":"10.1109/ICICIP53388.2021.9642201","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642201","url":null,"abstract":"In this paper, we first propose a Zhang neural dynamics (ZND) model for the generalized Sinkhorn scaling of time-varying matrix. Specifically, by using the dimensional reduction technique, a continuous-time ZND model of time-varying matrix scaling is proposed and analyzed. In addition, the corresponding theoretical proofs are given, which prove the theoretical validity of the proposed ZND model. Moreover, two numerical experiments containing a square case and a rectangle case are also conducted. Numerical experiments and results substantiate the effectiveness and accuracy of the proposed ZND model.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134474940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In modern astronomy, pulsar identification is a vital task to help researchers discovering new pulsars. With the great progress of modern radio telescopes improves, the amount of pulsar data collected increases exponentially, which causes the traditional pulsar identification approaches to be not enough to tackle such a large dataset. At present, many pulsar identification methods achieve promising performance based on deep neural networks. However, those neural-network-based methods still face the sample imbalance problem, which limits their performance. To be specific, the pulsar sample imbalance problem is that only an extremely limited number of real pulsar samples exist in dataset. To alleviate the problem and enhance the pulsar identification performance, we present a novel method under the framework of synergetic learning systems which includes the variational autoencoder and residual network. In this work, the variational autoencoder is used to generate some high-quality pulsar samples for training procedure to mitigate the pulsar sample imbalance problem, and then we present a residual-network-based model to promote pulsar candidate identification performance. Extensive experiments on two pulsar datasets demonstrate that our framework not only alleviates the imbalance problem, but also improves the accuracy of pulsar identification.
{"title":"Pulsar Identification Based on Variational Autoencoder and Residual Network","authors":"Guiru Liu, Yefan Li, Zelun Bao, Qian Yin, Ping Guo","doi":"10.1109/ICICIP53388.2021.9642198","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642198","url":null,"abstract":"In modern astronomy, pulsar identification is a vital task to help researchers discovering new pulsars. With the great progress of modern radio telescopes improves, the amount of pulsar data collected increases exponentially, which causes the traditional pulsar identification approaches to be not enough to tackle such a large dataset. At present, many pulsar identification methods achieve promising performance based on deep neural networks. However, those neural-network-based methods still face the sample imbalance problem, which limits their performance. To be specific, the pulsar sample imbalance problem is that only an extremely limited number of real pulsar samples exist in dataset. To alleviate the problem and enhance the pulsar identification performance, we present a novel method under the framework of synergetic learning systems which includes the variational autoencoder and residual network. In this work, the variational autoencoder is used to generate some high-quality pulsar samples for training procedure to mitigate the pulsar sample imbalance problem, and then we present a residual-network-based model to promote pulsar candidate identification performance. Extensive experiments on two pulsar datasets demonstrate that our framework not only alleviates the imbalance problem, but also improves the accuracy of pulsar identification.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132890915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A form-finding method for symmetric tensegrity structure is proposed based on the eigenvalue minimization problem of force density matrix in this paper. The topology is the only premise condition about the structure. The problem to solve force density in the self-equilibrium tensegrity structure is transformed into a linear optimization problem, which the force density matrix under the rank deficiency condition. The constraints of the objective function can be established by the characteristics of member forces and the group theory. Then the nodal coordinates can be determined by eigenvalue decomposition once the force densities is obtained. In order to to show the efficiency of the proposed method, several simulations of tensegrity structures which include plane and spatial are demonstrated. It can be found that the form-finding process of symmetric tensegrity structure in the proposed method has the characteristics of rapid speed and high precision.
{"title":"An Improved Form-Finding Method for Calculating Force Density with Group Theory","authors":"Taotao Heng, Liming Zhao, Keping Liu, Jiang Yi, Xiao-jun Duan, Zhongbo Sun","doi":"10.1109/ICICIP53388.2021.9642188","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642188","url":null,"abstract":"A form-finding method for symmetric tensegrity structure is proposed based on the eigenvalue minimization problem of force density matrix in this paper. The topology is the only premise condition about the structure. The problem to solve force density in the self-equilibrium tensegrity structure is transformed into a linear optimization problem, which the force density matrix under the rank deficiency condition. The constraints of the objective function can be established by the characteristics of member forces and the group theory. Then the nodal coordinates can be determined by eigenvalue decomposition once the force densities is obtained. In order to to show the efficiency of the proposed method, several simulations of tensegrity structures which include plane and spatial are demonstrated. It can be found that the form-finding process of symmetric tensegrity structure in the proposed method has the characteristics of rapid speed and high precision.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121781361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-03DOI: 10.1109/ICICIP53388.2021.9642159
Huanbin Zou, Jie Zhu
Recently, deep learning-based speech enhancement approaches have been researched extensively. Most methods focus on reconstructing the target clean speech’s magnitude spectrum from noisy speech’s magnitude spectrum, and then combine the noisy speech’s phase spectrum to synthesize the waveform. In this paper, we propose a complex network-based model called Deep Complex Temporal Convolutional Network (DCTCN) to estimate the complex-valued short-time Fourier transform (STFT) of target speech from noisy speech. We design a temporal convolutional network (TCN) block based on complex dilated causal convolution. In our proposed DCTCN, we achieve an outstanding denoising performance with a low complexity of 1.33M parameters. The experiments are conducted on the DNS Challenge dataset, and the results show that complex operations and TCN blocks have significant positive effects in noise suppression.
{"title":"DCTCN: Deep Complex Temporal Convolutional Network for Real Time Speech Enhancement","authors":"Huanbin Zou, Jie Zhu","doi":"10.1109/ICICIP53388.2021.9642159","DOIUrl":"https://doi.org/10.1109/ICICIP53388.2021.9642159","url":null,"abstract":"Recently, deep learning-based speech enhancement approaches have been researched extensively. Most methods focus on reconstructing the target clean speech’s magnitude spectrum from noisy speech’s magnitude spectrum, and then combine the noisy speech’s phase spectrum to synthesize the waveform. In this paper, we propose a complex network-based model called Deep Complex Temporal Convolutional Network (DCTCN) to estimate the complex-valued short-time Fourier transform (STFT) of target speech from noisy speech. We design a temporal convolutional network (TCN) block based on complex dilated causal convolution. In our proposed DCTCN, we achieve an outstanding denoising performance with a low complexity of 1.33M parameters. The experiments are conducted on the DNS Challenge dataset, and the results show that complex operations and TCN blocks have significant positive effects in noise suppression.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125880629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}