{"title":"INFERRING SOCIAL CONTEXTS FROM AUDIO RECORDINGS USING DEEP NEURAL NETWORKS.","authors":"Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk","doi":"10.1109/MLSP.2014.6958853","DOIUrl":null,"url":null,"abstract":"<p><p>In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.</p>","PeriodicalId":73290,"journal":{"name":"IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing","volume":"2014 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7934587/pdf/nihms-1670823.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2014.6958853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/11/20 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.