Edoardo Bucheli-Susarrey, Miguel González-Mendoza, Oscar Herrera-Alcántara
{"title":"Detección de comandos de voz con modelos compactos de aprendizaje profundo","authors":"Edoardo Bucheli-Susarrey, Miguel González-Mendoza, Oscar Herrera-Alcántara","doi":"10.13053/rcs-148-7-26","DOIUrl":null,"url":null,"abstract":"The Keyword Detection problem consists in localizing a small vocabulary of words embedded in some stream of audio. Keyword Detection constantly runs in the background of many mobile devices and thus it becomes a requirement to create models with a small memory footprint and low computational power. Using the Simple Speech Commands Detection data set, we present a comparative study using two types of layers. Hand-Engineered layers are created from audio feature extraction models based on the Fourier Transform and Mel Filterbanks. Learned layers belong to the Deep Learning literature and include dense layers, recurrent layers and convolutional layers. Using the Deep Learning Pipeline, we organize these layers to solve the problem.","PeriodicalId":220522,"journal":{"name":"Res. Comput. Sci.","volume":"424 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Res. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/rcs-148-7-26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Keyword Detection problem consists in localizing a small vocabulary of words embedded in some stream of audio. Keyword Detection constantly runs in the background of many mobile devices and thus it becomes a requirement to create models with a small memory footprint and low computational power. Using the Simple Speech Commands Detection data set, we present a comparative study using two types of layers. Hand-Engineered layers are created from audio feature extraction models based on the Fourier Transform and Mel Filterbanks. Learned layers belong to the Deep Learning literature and include dense layers, recurrent layers and convolutional layers. Using the Deep Learning Pipeline, we organize these layers to solve the problem.