{"title":"将音节长度纳入基于行检测的口语术语检测","authors":"Teppei Ohno, T. Akiba","doi":"10.1109/SLT.2012.6424223","DOIUrl":null,"url":null,"abstract":"A conventional method for spoken term detection (STD) is to apply approximate string matching to subword sequences in a spoken document obtained by speech recognition. An STD method that considers string matching as line detection in a syllable distance plane has been proposed. While this has demonstrated fast ordered-by-distance detections, it has still suffered from the insertion and deletion errors introduced by the speech recognition. In this work, we aim to improve detection performance by employing syllable-duration information. The proposed method enables robust detection by introducing a distance plane that uses frames as units instead of using syllables as units. Our experimental evaluation showed that the incorporation of syllable-duration achieved higher detection performance in high-recall regions.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Incorporating syllable duration into line-detection-based spoken term detection\",\"authors\":\"Teppei Ohno, T. Akiba\",\"doi\":\"10.1109/SLT.2012.6424223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A conventional method for spoken term detection (STD) is to apply approximate string matching to subword sequences in a spoken document obtained by speech recognition. An STD method that considers string matching as line detection in a syllable distance plane has been proposed. While this has demonstrated fast ordered-by-distance detections, it has still suffered from the insertion and deletion errors introduced by the speech recognition. In this work, we aim to improve detection performance by employing syllable-duration information. The proposed method enables robust detection by introducing a distance plane that uses frames as units instead of using syllables as units. Our experimental evaluation showed that the incorporation of syllable-duration achieved higher detection performance in high-recall regions.\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Incorporating syllable duration into line-detection-based spoken term detection
A conventional method for spoken term detection (STD) is to apply approximate string matching to subword sequences in a spoken document obtained by speech recognition. An STD method that considers string matching as line detection in a syllable distance plane has been proposed. While this has demonstrated fast ordered-by-distance detections, it has still suffered from the insertion and deletion errors introduced by the speech recognition. In this work, we aim to improve detection performance by employing syllable-duration information. The proposed method enables robust detection by introducing a distance plane that uses frames as units instead of using syllables as units. Our experimental evaluation showed that the incorporation of syllable-duration achieved higher detection performance in high-recall regions.