音频指纹失真判别分析

IEEE Trans. Speech Audio Process. Pub Date : 2003-07-09 DOI:10.1109/TSA.2003.811538

C. Burges, John C. Platt, S. Jana

{"title":"音频指纹失真判别分析","authors":"C. Burges, John C. Platt, S. Jana","doi":"10.1109/TSA.2003.811538","DOIUrl":null,"url":null,"abstract":"Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"33 1","pages":"165-174"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"152","resultStr":"{\"title\":\"Distortion discriminant analysis for audio fingerprinting\",\"authors\":\"C. Burges, John C. Platt, S. Jana\",\"doi\":\"10.1109/TSA.2003.811538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.\",\"PeriodicalId\":13155,\"journal\":{\"name\":\"IEEE Trans. Speech Audio Process.\",\"volume\":\"33 1\",\"pages\":\"165-174\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"152\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Trans. Speech Audio Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSA.2003.811538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.811538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 152

摘要

将音频数据映射到特征向量，用于分类、检索或识别任务，提出了四个主要挑战。输入的维数必须显著降低;得到的特征必须对可能的输入失真具有鲁棒性;特征必须对手头的任务提供信息;特征提取操作必须计算效率高。我们提出失真判别分析(DDA)，它满足所有这四个要求。DDA构建了一个线性的卷积神经网络，每一层都执行一个定向的PCA降维。我们在两个音频指纹任务上证明了DDA的有效性:在36小时的音频测试数据中搜索500个音频片段;在数据库中比对了超过10天的音频和大约24万个指纹。我们证明了该系统对训练过程中不存在的各种噪声具有鲁棒性。在大型测试中，系统给出的假阳性率为每个音频片段，每个指纹1.5 /spl乘以/ 10/sup -8/，每个片段的假阴性率为0.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Distortion discriminant analysis for audio fingerprinting

Mapping audio data to feature vectors for the classification, retrieval or identification tasks presents four principal challenges. The dimensionality of the input must be significantly reduced; the resulting features must be robust to likely distortions of the input; the features must be informative for the task at hand; and the feature extraction operation must be computationally efficient. We propose distortion discriminant analysis (DDA), which fulfills all four of these requirements. DDA constructs a linear, convolutional neural network out of layers, each of which performs an oriented PCA dimensional reduction. We demonstrate the effectiveness of DDA on two audio fingerprinting tasks: searching for 500 audio clips in 36 h of audio test data; and playing over 10 days of audio against a database with approximately 240 000 fingerprints. We show that the system is robust to kinds of noise that are not present in the training procedure. In the large test, the system gives a false positive rate of 1.5 /spl times/ 10/sup -8/ per audio clip, per fingerprint, at a false negative rate of 0.2% per clip.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Trans. Speech Audio Process.

自引率

0.00%

发文量