{"title":"DNN based phrase boundary detection using knowledge-based features and feature representations from CNN","authors":"Pavan Kumar, Chiranjeevi Yarra, P. Ghosh","doi":"10.1109/NCC52529.2021.9530147","DOIUrl":null,"url":null,"abstract":"Automatic phrase boundary detection could be useful in applications, including computer-assisted pronunciation tutoring, spoken language understanding, and automatic speech recognition. In this work, we consider the problem of phrase boundary detection on English utterances spoken by native American speakers. Most of the existing works on boundary detection use either knowledge-based features or representations learnt from a convolutional neural network (CNN) based architecture, considering word segments. However, we hypothesize that combining knowledge-based features and learned representations could improve the boundary detection task's performance. For this, we consider a fusion-based model considering deep neural network (DNN) and CNN, where CNNs are used for learning representations and DNN is used to combine knowledge-based features and learned representations. Further, unlike existing data-driven methods, we consider two CNNs for learning representation, one for word segments and another for word-final syllable segments. Experiments on Boston University radio news and Switchboard corpora show the benefit of the proposed fusion-based approach compared to a baseline using knowledge-based features only and another baseline using feature representations from CNN only.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC52529.2021.9530147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automatic phrase boundary detection could be useful in applications, including computer-assisted pronunciation tutoring, spoken language understanding, and automatic speech recognition. In this work, we consider the problem of phrase boundary detection on English utterances spoken by native American speakers. Most of the existing works on boundary detection use either knowledge-based features or representations learnt from a convolutional neural network (CNN) based architecture, considering word segments. However, we hypothesize that combining knowledge-based features and learned representations could improve the boundary detection task's performance. For this, we consider a fusion-based model considering deep neural network (DNN) and CNN, where CNNs are used for learning representations and DNN is used to combine knowledge-based features and learned representations. Further, unlike existing data-driven methods, we consider two CNNs for learning representation, one for word segments and another for word-final syllable segments. Experiments on Boston University radio news and Switchboard corpora show the benefit of the proposed fusion-based approach compared to a baseline using knowledge-based features only and another baseline using feature representations from CNN only.