音频指纹失真判别分析

C. Burges, John C. Platt, S. Jana
{"title":"音频指纹失真判别分析","authors":"C. Burges, John C. Platt, S. Jana","doi":"10.1109/TSA.2003.811538","DOIUrl":null,"url":null,"abstract":"Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"33 1","pages":"165-174"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"152","resultStr":"{\"title\":\"Distortion discriminant analysis for audio fingerprinting\",\"authors\":\"C. Burges, John C. Platt, S. Jana\",\"doi\":\"10.1109/TSA.2003.811538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.\",\"PeriodicalId\":13155,\"journal\":{\"name\":\"IEEE Trans. Speech Audio Process.\",\"volume\":\"33 1\",\"pages\":\"165-174\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"152\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Trans. Speech Audio Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSA.2003.811538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.811538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 152

摘要

将音频数据映射到特征向量,用于分类、检索或识别任务,提出了四个主要挑战。输入的维数必须显著降低;得到的特征必须对可能的输入失真具有鲁棒性;特征必须对手头的任务提供信息;特征提取操作必须计算效率高。我们提出失真判别分析(DDA),它满足所有这四个要求。DDA构建了一个线性的卷积神经网络,每一层都执行一个定向的PCA降维。我们在两个音频指纹任务上证明了DDA的有效性:在36小时的音频测试数据中搜索500个音频片段;在数据库中比对了超过10天的音频和大约24万个指纹。我们证明了该系统对训练过程中不存在的各种噪声具有鲁棒性。在大型测试中,系统给出的假阳性率为每个音频片段,每个指纹1.5 /spl乘以/ 10/sup -8/,每个片段的假阴性率为0.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Distortion discriminant analysis for audio fingerprinting
Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1