用相图训练的神经网络识别声源节点位置

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) Pub Date : 2020-12-09 DOI:10.1109/ISSPIT51521.2020.9408643

A. Copiaco, C. Ritz, Stefano Fasciani, N. Abdulaziz

{"title":"用相图训练的神经网络识别声源节点位置","authors":"A. Copiaco, C. Ritz, Stefano Fasciani, N. Abdulaziz","doi":"10.1109/ISSPIT51521.2020.9408643","DOIUrl":null,"url":null,"abstract":"In this work, the best approximation of the sound source location through neural networks is examined. Majority of related work either omits the phase information from the Short Time Fourier Transform (STFT), or uses it for the sole purpose of restoring irregularities in spectrograms. Our process differ, such that it focuses on the phase component of the STFT coefficients to estimate the sound source location by classifying the closest microphone array (node). The image resulting from the mapping of the phase differences information within the time-frequency domain results in what we call phasograms, and are used as inputs to the neural network. Experimentation is achieved through recordings of the first four nodes of the SINS database. For this work, phase difference across adjacent microphones, as well as against the first microphone, were examined. Within a five-fold cross validation, this resulted in an F1-score of 99.68% for the former, and 99.31% for the latter. A real world application for our work are healthcare monitoring systems, when integrated with a sound scene classification system.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Identifying Sound Source Node Locations Using Neural Networks Trained with Phasograms\",\"authors\":\"A. Copiaco, C. Ritz, Stefano Fasciani, N. Abdulaziz\",\"doi\":\"10.1109/ISSPIT51521.2020.9408643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, the best approximation of the sound source location through neural networks is examined. Majority of related work either omits the phase information from the Short Time Fourier Transform (STFT), or uses it for the sole purpose of restoring irregularities in spectrograms. Our process differ, such that it focuses on the phase component of the STFT coefficients to estimate the sound source location by classifying the closest microphone array (node). The image resulting from the mapping of the phase differences information within the time-frequency domain results in what we call phasograms, and are used as inputs to the neural network. Experimentation is achieved through recordings of the first four nodes of the SINS database. For this work, phase difference across adjacent microphones, as well as against the first microphone, were examined. Within a five-fold cross validation, this resulted in an F1-score of 99.68% for the former, and 99.31% for the latter. A real world application for our work are healthcare monitoring systems, when integrated with a sound scene classification system.\",\"PeriodicalId\":111385,\"journal\":{\"name\":\"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT51521.2020.9408643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT51521.2020.9408643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在这项工作中，研究了通过神经网络对声源位置的最佳逼近。大多数相关工作要么忽略短时傅里叶变换(STFT)中的相位信息，要么将其仅用于恢复频谱图中的不规则性。我们的过程有所不同，因此它侧重于STFT系数的相位分量，通过对最近的麦克风阵列(节点)进行分类来估计声源位置。由时频域中相位差信息的映射得到的图像就是我们所说的相位图，并被用作神经网络的输入。实验是通过记录SINS数据库的前四个节点实现的。在这项工作中，检查了相邻麦克风之间以及与第一个麦克风之间的相位差。在五重交叉验证中，前者的f1得分为99.68%，后者为99.31%。我们工作的一个实际应用是医疗监控系统，当与声音场景分类系统集成时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Identifying Sound Source Node Locations Using Neural Networks Trained with Phasograms

In this work, the best approximation of the sound source location through neural networks is examined. Majority of related work either omits the phase information from the Short Time Fourier Transform (STFT), or uses it for the sole purpose of restoring irregularities in spectrograms. Our process differ, such that it focuses on the phase component of the STFT coefficients to estimate the sound source location by classifying the closest microphone array (node). The image resulting from the mapping of the phase differences information within the time-frequency domain results in what we call phasograms, and are used as inputs to the neural network. Experimentation is achieved through recordings of the first four nodes of the SINS database. For this work, phase difference across adjacent microphones, as well as against the first microphone, were examined. Within a five-fold cross validation, this resulted in an F1-score of 99.68% for the former, and 99.31% for the latter. A real world application for our work are healthcare monitoring systems, when integrated with a sound scene classification system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

自引率

0.00%

发文量