首页 > 最新文献

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)最新文献

英文 中文
A Joint Detection-Classification Model for Weakly Supervised Sound Event Detection Using Multi-Scale Attention Method 基于多尺度注意法的弱监督声音事件检测联合检测分类模型
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408948
Yaoguang Wang, Liang He
Attention mechanism has been applied to the weakly supervised sound event detection (SED) and has achieved state-of-the-art performance, but most methods only concentrate along the time axis. In this paper, we propose the multi-scale time-frequency attention (MTFA) method to capture the intrinsic features at different scales both in time and frequency domain for audio tagging (AT) and SED. Our model is a unified network which can perform AT and SED simultaneously, it produces multi-scale attention-aware representations for SED with MTFA module, and a global pooling module maps the representations to presence probability of corresponding audio event for AT. To evaluate the proposed method, we conduct experiments on Task4 of Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, and it achieves 57.9% (F1-score) in AT task and 0.71 (error rate) in SED task on evaluation set, which is comparable to the state-of-the-art results in the challenge.
注意机制已被应用于弱监督声事件检测(SED)中,并取得了较好的效果,但大多数方法只集中在时间轴上。本文提出了一种多尺度时频注意(MTFA)方法,用于音频标记(at)和SED标记(SED)在时间和频域上捕捉不同尺度的固有特征。我们的模型是一个可以同时执行语音识别和语音识别的统一网络,它通过MTFA模块为语音识别生成多尺度的注意力感知表示,并通过全局池化模块将这些表示映射到语音识别中相应音频事件的出现概率。为了对该方法进行评价,我们在声学场景和事件检测与分类(DCASE)任务Task4上进行了实验,在评估集上,该方法在AT任务中达到57.9% (f1分),在SED任务中达到0.71(错误率),与该挑战的最新结果相当。
{"title":"A Joint Detection-Classification Model for Weakly Supervised Sound Event Detection Using Multi-Scale Attention Method","authors":"Yaoguang Wang, Liang He","doi":"10.1109/ISSPIT51521.2020.9408948","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408948","url":null,"abstract":"Attention mechanism has been applied to the weakly supervised sound event detection (SED) and has achieved state-of-the-art performance, but most methods only concentrate along the time axis. In this paper, we propose the multi-scale time-frequency attention (MTFA) method to capture the intrinsic features at different scales both in time and frequency domain for audio tagging (AT) and SED. Our model is a unified network which can perform AT and SED simultaneously, it produces multi-scale attention-aware representations for SED with MTFA module, and a global pooling module maps the representations to presence probability of corresponding audio event for AT. To evaluate the proposed method, we conduct experiments on Task4 of Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, and it achieves 57.9% (F1-score) in AT task and 0.71 (error rate) in SED task on evaluation set, which is comparable to the state-of-the-art results in the challenge.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121875437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vector Quantizer with Fuzzy Equivalence Relations clustering 具有模糊等价关系聚类的矢量量化器
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408873
S. Chakraborty, M. Fowler
The scalar quantizer is often used in many applications due to its simplicity and ease with which it can be implemented. However, whenever we have some constraint in terms of bit rate or distortion, the vector quantizer is almost always a better choice. This is because for a given bit rate or for a given distortion, we can always design a vector quantizer that outperforms the optimal scalar quantizer. There are several algorithms to design a vector quantizer. But, the most popular algorithm is the Linde-Buzo-Gray algorithm which is based on the k-means clustering. For the LBG algorithm, we need to specify the number of clusters as well as the initial reconstruction vectors, which are then updated in successive iterations. Often, choosing the initial reconstruction vectors is not an easy task, especially when we deal with higher dimensions. A better option would be to naturally obtain the initial partitions from the given dataset. In the present article, we describe a hierarchical clustering based vector quantizer design. With our approach, we no longer need to choose the initial reconstruction vectors, but we naturally obtain the partitions for the given bit rate. Moreover, once we obtain the partitions, we simply place our reconstruction vectors at the centroid of the partitions and hence we avoid performing successive iterations and updating the clusters.
标量量化器由于其简单和易于实现而经常用于许多应用程序。然而,每当我们在比特率或失真方面有一些限制时,矢量量化器几乎总是更好的选择。这是因为对于给定的比特率或给定的失真,我们总是可以设计出优于最佳标量量化器的矢量量化器。有几种算法来设计矢量量化器。但是,最流行的算法是基于k均值聚类的Linde-Buzo-Gray算法。对于LBG算法,我们需要指定簇的数量以及初始重构向量,然后在连续迭代中更新。通常,选择初始重建向量不是一件容易的事情,特别是当我们处理高维时。更好的选择是从给定的数据集自然地获得初始分区。在本文中,我们描述了一种基于层次聚类的矢量量化器设计。使用我们的方法,我们不再需要选择初始重建向量,但我们很自然地获得给定比特率的分区。此外,一旦我们获得分区,我们只需将重建向量放置在分区的质心上,因此我们避免了执行连续迭代和更新簇。
{"title":"Vector Quantizer with Fuzzy Equivalence Relations clustering","authors":"S. Chakraborty, M. Fowler","doi":"10.1109/ISSPIT51521.2020.9408873","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408873","url":null,"abstract":"The scalar quantizer is often used in many applications due to its simplicity and ease with which it can be implemented. However, whenever we have some constraint in terms of bit rate or distortion, the vector quantizer is almost always a better choice. This is because for a given bit rate or for a given distortion, we can always design a vector quantizer that outperforms the optimal scalar quantizer. There are several algorithms to design a vector quantizer. But, the most popular algorithm is the Linde-Buzo-Gray algorithm which is based on the k-means clustering. For the LBG algorithm, we need to specify the number of clusters as well as the initial reconstruction vectors, which are then updated in successive iterations. Often, choosing the initial reconstruction vectors is not an easy task, especially when we deal with higher dimensions. A better option would be to naturally obtain the initial partitions from the given dataset. In the present article, we describe a hierarchical clustering based vector quantizer design. With our approach, we no longer need to choose the initial reconstruction vectors, but we naturally obtain the partitions for the given bit rate. Moreover, once we obtain the partitions, we simply place our reconstruction vectors at the centroid of the partitions and hence we avoid performing successive iterations and updating the clusters.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117264445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single Channel QRS Detection Using Wavelet And Median Denoising With Adaptive Multilevel Thresholding 基于小波和自适应多阈值中值去噪的单通道QRS检测
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408699
S. Modak, L. Taha, E. Abdel-Raheem
The study of heartbeats in electrocardiogram (ECG) signals is very important to sustain good health. Any anomalies in the heart rhythm can be detected by carefully studying the ECG signal. The detection of the QRS is obstructed by external and internal sources of noise. Automatic detection of the QRS is achieved by diminishing these noises to a minimum by different types of filtering such as band-pass filtering, wavelet transform, and applying thresholds. This paper presents a new method of QRS detection using discrete wavelet transform (DWT), median filtering, and adaptive multilevel thresholding (AMT). The proposed method is tested for the MIT-BIH Arrhythmia database and shows a high sensitivity of 99.74%, positive predictivity of 99.88%, and a detection error rate of 0.38%. In addition to this, the proposed technique is quite robust and can adapt to signals with a low signal-to-noise ratio.
研究心电图(ECG)信号中的心跳对维持身体健康非常重要。通过仔细研究心电信号,可以发现心律的任何异常。QRS的检测受到外部和内部噪声源的阻碍。QRS的自动检测是通过不同类型的滤波(如带通滤波、小波变换和应用阈值)将这些噪声减小到最小来实现的。本文提出了一种基于离散小波变换(DWT)、中值滤波和自适应多电平阈值(AMT)的QRS检测新方法。在MIT-BIH心律失常数据库中进行了测试,结果表明,该方法的灵敏度为99.74%,阳性预测率为99.88%,检测错误率为0.38%。此外,所提出的技术具有很强的鲁棒性,可以适应低信噪比的信号。
{"title":"Single Channel QRS Detection Using Wavelet And Median Denoising With Adaptive Multilevel Thresholding","authors":"S. Modak, L. Taha, E. Abdel-Raheem","doi":"10.1109/ISSPIT51521.2020.9408699","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408699","url":null,"abstract":"The study of heartbeats in electrocardiogram (ECG) signals is very important to sustain good health. Any anomalies in the heart rhythm can be detected by carefully studying the ECG signal. The detection of the QRS is obstructed by external and internal sources of noise. Automatic detection of the QRS is achieved by diminishing these noises to a minimum by different types of filtering such as band-pass filtering, wavelet transform, and applying thresholds. This paper presents a new method of QRS detection using discrete wavelet transform (DWT), median filtering, and adaptive multilevel thresholding (AMT). The proposed method is tested for the MIT-BIH Arrhythmia database and shows a high sensitivity of 99.74%, positive predictivity of 99.88%, and a detection error rate of 0.38%. In addition to this, the proposed technique is quite robust and can adapt to signals with a low signal-to-noise ratio.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121227094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the performance of Low Computational Complexity DSI Suppression Techniques Using Satellite Transmitters 基于卫星发射机的低计算复杂度DSI抑制技术性能研究
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408755
Webert Montlouis, Yingxu Zhu
Direct Signal Interference (DSI) suppression is a necessary step in any passive bistatic radar system. The ground-based bistatic radar suffers significantly from the direct signal interference because of the short baseline distance between the transmitter and the receiver. But, many other bistatic radar geometries minimize the impact of this physical constraint. Each configuration has its degree of difficulties and requires less or more complicated suppression algorithms for a successful implementation. In some configurations, the power level difference between the target signals versus the direct path is not as large, therefore it is possible to use less complicated DSI techniques to pull the target signal from interference plus noise. One such bistatic geometry uses the satellite-based bistatic radar concept to perform surveillance in an area of interest close to the ground. This paper investigates the performance of a DVB-S signal using a class of iterative algorithms Normalized Least Mean Squares (NLMS), Wiener, Recursive Least Squares (RLS), and Fast Block Least Mean Squares (FBLMS) to suppress the direct signal using a satellite-based transmitter and a ground-based receiver to perform surveillance.
抑制直接信号干扰(DSI)是任何无源双基地雷达系统的必要步骤。由于发射机和接收机之间的基线距离较短,地基双基地雷达容易受到直接信号干扰。但是,许多其他双基地雷达几何形状将这种物理限制的影响降到最低。每种配置都有其困难程度,并且需要更少或更复杂的抑制算法才能成功实现。在某些配置中,目标信号与直接路径之间的功率级差没有那么大,因此可以使用不太复杂的DSI技术将目标信号从干扰和噪声中拉出来。一种这样的双基地几何结构使用基于卫星的双基地雷达概念在靠近地面的感兴趣区域执行监视。本文研究了DVB-S信号的性能,使用一类迭代算法归一化最小均二乘(NLMS)、维纳、递归最小二乘(RLS)和快速块最小均二乘(FBLMS)来抑制使用基于卫星的发射机和基于地面的接收器进行监视的直接信号。
{"title":"On the performance of Low Computational Complexity DSI Suppression Techniques Using Satellite Transmitters","authors":"Webert Montlouis, Yingxu Zhu","doi":"10.1109/ISSPIT51521.2020.9408755","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408755","url":null,"abstract":"Direct Signal Interference (DSI) suppression is a necessary step in any passive bistatic radar system. The ground-based bistatic radar suffers significantly from the direct signal interference because of the short baseline distance between the transmitter and the receiver. But, many other bistatic radar geometries minimize the impact of this physical constraint. Each configuration has its degree of difficulties and requires less or more complicated suppression algorithms for a successful implementation. In some configurations, the power level difference between the target signals versus the direct path is not as large, therefore it is possible to use less complicated DSI techniques to pull the target signal from interference plus noise. One such bistatic geometry uses the satellite-based bistatic radar concept to perform surveillance in an area of interest close to the ground. This paper investigates the performance of a DVB-S signal using a class of iterative algorithms Normalized Least Mean Squares (NLMS), Wiener, Recursive Least Squares (RLS), and Fast Block Least Mean Squares (FBLMS) to suppress the direct signal using a satellite-based transmitter and a ground-based receiver to perform surveillance.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116106402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Algorithms for Forecasting COVID 19 Confirmed Cases in America 预测美国COVID - 19确诊病例的机器学习算法
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408742
Mario Fernando Jojoa Acosta, Begonya García-Zapirain Soto
This paper presents a Multilayer Perceptron and Support Vector Machine algorithms approach to predict the number of COVID19 infections in different countries of America. It intends to serve as a tool for decision-making and tackling the pandemic that the world is currently facing. The models were trained and tested using open data from the European Union repository where a time series of confirmed contagious cases was modeled until May 25, 2020. The hyperparameters as number of neurons per layer were set up using a tabu list algorithm. The countries selected to carry out the study were Brazil, Chile, Colombia, Mexico, Peru and the United States. The metrics used are Pearson’s correlation coefficient (CP), Mean Absolute Error (MAE), and Mean Percentage Error (MPE). For the testing stage we obtained the following results: Brazil, CP=0.65, MAE=2508 and MPE=17%; Chile, CP=0.64, MAE=504, MPE=16%; Colombia, CP=0.83, MAE=76, MPE=9%; Mexico, CP=0.77, MAE=231, MPE=9%; Peru, CP=0.76, MAE=686, MPE=18% and the United States of America, CP=0.93, MAE=799, MPE=4%. This resulted in powerful machine learning tools although it is necessary to use specific algorithms depending on the data and the stage of the country’s pandemic.
本文提出了一种多层感知机和支持向量机算法来预测美国不同国家的covid - 19感染人数。它打算成为决策和应对世界目前面临的这一流行病的工具。这些模型使用来自欧盟存储库的开放数据进行了训练和测试,该存储库对2020年5月25日之前确诊的传染性病例的时间序列进行了建模。使用禁忌列表算法建立每层神经元数量的超参数。选定进行这项研究的国家是巴西、智利、哥伦比亚、墨西哥、秘鲁和美国。使用的指标是皮尔逊相关系数(CP)、平均绝对误差(MAE)和平均百分比误差(MPE)。对于测试阶段,我们得到以下结果:巴西,CP=0.65, MAE=2508, MPE=17%;智利,CP=0.64, MAE=504, MPE=16%;哥伦比亚,CP=0.83, MAE=76, MPE=9%;墨西哥,CP=0.77, MAE=231, MPE=9%;秘鲁,CP=0.76, MAE=686, MPE=18%;美国,CP=0.93, MAE=799, MPE=4%。这导致了强大的机器学习工具,尽管有必要根据数据和国家大流行的阶段使用特定的算法。
{"title":"Machine Learning Algorithms for Forecasting COVID 19 Confirmed Cases in America","authors":"Mario Fernando Jojoa Acosta, Begonya García-Zapirain Soto","doi":"10.1109/ISSPIT51521.2020.9408742","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408742","url":null,"abstract":"This paper presents a Multilayer Perceptron and Support Vector Machine algorithms approach to predict the number of COVID19 infections in different countries of America. It intends to serve as a tool for decision-making and tackling the pandemic that the world is currently facing. The models were trained and tested using open data from the European Union repository where a time series of confirmed contagious cases was modeled until May 25, 2020. The hyperparameters as number of neurons per layer were set up using a tabu list algorithm. The countries selected to carry out the study were Brazil, Chile, Colombia, Mexico, Peru and the United States. The metrics used are Pearson’s correlation coefficient (CP), Mean Absolute Error (MAE), and Mean Percentage Error (MPE). For the testing stage we obtained the following results: Brazil, CP=0.65, MAE=2508 and MPE=17%; Chile, CP=0.64, MAE=504, MPE=16%; Colombia, CP=0.83, MAE=76, MPE=9%; Mexico, CP=0.77, MAE=231, MPE=9%; Peru, CP=0.76, MAE=686, MPE=18% and the United States of America, CP=0.93, MAE=799, MPE=4%. This resulted in powerful machine learning tools although it is necessary to use specific algorithms depending on the data and the stage of the country’s pandemic.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"593 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116207530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Time- vs. Frequency-Domain Channel Estimation in MIMO LOS Frequency-Selective Channels MIMO LOS选频信道的时频域信道估计
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408923
K. Ntontin, Nikolaos D. Skentos, F. Lazarakis
Motivated by the importance of multiple-input and multiple output line-of-sight communication in next generation backhaul networks, in this work we provide an overhead and performance comparison between time- and frequency-domain channel estimation in a bursty 2x2 LOS environment with single-carrier transmission and frequency-domain equalization at the receiver. For both types of channel estimation, analytical expressions for the weights of the involved equalizers are provided in the case of minimum-mean square error equalization. Finally, simulation results are provided regarding their error rate comparison and a discussion concerning their training overhead requirements.
由于下一代回程网络中多输入和多输出视距通信的重要性,在这项工作中,我们提供了在突发2x2 LOS环境下,具有单载波传输和接收机频域均衡的时域和频域信道估计的开销和性能比较。对于这两种类型的信道估计,在最小均方误差均衡的情况下,提供了所涉及的均衡器的权重的解析表达式。最后给出了错误率比较的仿真结果,并对其训练开销要求进行了讨论。
{"title":"Time- vs. Frequency-Domain Channel Estimation in MIMO LOS Frequency-Selective Channels","authors":"K. Ntontin, Nikolaos D. Skentos, F. Lazarakis","doi":"10.1109/ISSPIT51521.2020.9408923","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408923","url":null,"abstract":"Motivated by the importance of multiple-input and multiple output line-of-sight communication in next generation backhaul networks, in this work we provide an overhead and performance comparison between time- and frequency-domain channel estimation in a bursty 2x2 LOS environment with single-carrier transmission and frequency-domain equalization at the receiver. For both types of channel estimation, analytical expressions for the weights of the involved equalizers are provided in the case of minimum-mean square error equalization. Finally, simulation results are provided regarding their error rate comparison and a discussion concerning their training overhead requirements.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A TCN-based Primary Ambient Extraction in Generating Ambisonics Audio from Panorama Video 基于tcn的全景视频生成立体声音频的主环境提取
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408696
Zhuliang Lv, Yi Zhou, Hongqing Liu, Xiaofeng Shu, Nannan Zhang
Spatial audio is one of the most essential parts of immersive audio-visual experience such as virtual reality (VR), which reproduces the inherent spatiality of sound and the correspondence of audio-visual experience. Ambisonics is the dominant spatial audio solution due to its flexibility and fidelity. However, the production of Ambisonics audio is difficult for the public because of the requirements of expensive equipments or professional music production ability. In this work, an end-to-end Ambisonics generator for panorama video is proposed. To improve the perception of directional sound, we assume that sound field is composed of a primary sound source and an ambient sound without spatiality, and a Temporal Convolutional Network (TCN) based Primary Ambient Extractor (PAE) is proposed to separate the two parts of sound field. The directional sound is spatially encoded by the weights from audio-visual fusion network added by ambient part. Our network is evaluated with panorama video clips with first order Ambisonics. The results show that the proposed approach outperforms other methods in terms of objective evaluations.
空间音频是虚拟现实等沉浸式视听体验的重要组成部分之一,它再现了声音固有的空间性和视听体验的对应性。由于其灵活性和保真度,立体声是占主导地位的空间音频解决方案。然而,由于昂贵的设备或专业的音乐制作能力的要求,大众很难制作出立体声音频。在这项工作中,提出了一个端到端的全景视频立体声发生器。为了提高方向性声音的感知能力,假设声场由主声源和环境声组成,没有空间性,提出了一种基于时间卷积网络(TCN)的主环境声提取器(PAE)来分离声场的两个部分。将声频融合网络的权值与环境分量相加,对定向声音进行空间编码。我们的网络用一阶立体声全景视频片段进行评估。结果表明,该方法在客观评价方面优于其他方法。
{"title":"A TCN-based Primary Ambient Extraction in Generating Ambisonics Audio from Panorama Video","authors":"Zhuliang Lv, Yi Zhou, Hongqing Liu, Xiaofeng Shu, Nannan Zhang","doi":"10.1109/ISSPIT51521.2020.9408696","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408696","url":null,"abstract":"Spatial audio is one of the most essential parts of immersive audio-visual experience such as virtual reality (VR), which reproduces the inherent spatiality of sound and the correspondence of audio-visual experience. Ambisonics is the dominant spatial audio solution due to its flexibility and fidelity. However, the production of Ambisonics audio is difficult for the public because of the requirements of expensive equipments or professional music production ability. In this work, an end-to-end Ambisonics generator for panorama video is proposed. To improve the perception of directional sound, we assume that sound field is composed of a primary sound source and an ambient sound without spatiality, and a Temporal Convolutional Network (TCN) based Primary Ambient Extractor (PAE) is proposed to separate the two parts of sound field. The directional sound is spatially encoded by the weights from audio-visual fusion network added by ambient part. Our network is evaluated with panorama video clips with first order Ambisonics. The results show that the proposed approach outperforms other methods in terms of objective evaluations.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133418701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DOAV Estimation Using L-Shaped Antenna Array Configuration 基于l形天线阵的DOAV估计
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408702
Webert Montlouis
To estimate the two-dimensional Directions of Arrival (DOA) of plane waves, a planar array is often used. The L-Shaped antenna structure provides a mechanism to estimate the 2D parameters without using a fully populated planar array. This antenna array geometry has been studied when we assume the source is stationary in the observation interval. It provides a more computationally efficient 2D DOA estimation. In this paper, we study the L-Shaped antenna array when the source is rapidly moving. In this case, not only we perform the DOA estimation but additional parameters such as angular velocities in azimuth and elevation are also estimated. In this presentation, we assume a white Gaussian background noise, and the Maximum Likelihood estimator is formulated.
为了估计平面波的二维到达方向(DOA),通常使用平面阵列。l形天线结构提供了一种无需使用完全填充的平面阵列来估计二维参数的机制。在假设源在观测区间内是平稳的情况下,对天线阵的几何形状进行了研究。它提供了一个计算效率更高的二维DOA估计。本文研究了源快速移动时的l形天线阵。在这种情况下,我们不仅进行了DOA估计,而且还估计了方位角和仰角角速度等附加参数。在本演示中,我们假设一个白高斯背景噪声,并制定了极大似然估计。
{"title":"DOAV Estimation Using L-Shaped Antenna Array Configuration","authors":"Webert Montlouis","doi":"10.1109/ISSPIT51521.2020.9408702","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408702","url":null,"abstract":"To estimate the two-dimensional Directions of Arrival (DOA) of plane waves, a planar array is often used. The L-Shaped antenna structure provides a mechanism to estimate the 2D parameters without using a fully populated planar array. This antenna array geometry has been studied when we assume the source is stationary in the observation interval. It provides a more computationally efficient 2D DOA estimation. In this paper, we study the L-Shaped antenna array when the source is rapidly moving. In this case, not only we perform the DOA estimation but additional parameters such as angular velocities in azimuth and elevation are also estimated. In this presentation, we assume a white Gaussian background noise, and the Maximum Likelihood estimator is formulated.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115625468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Time-Varying Forgetting Factor-Based QRRLS Algorithm for Multichannel Speech Dereverberation 一种基于时变遗忘因子的多通道语音去噪QRRLS算法
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408971
Xinyu Tang, Yang Xu, Rilin Chen, Yi Zhou
In this paper, we propose an adaptive multichannel linear prediction (MCLP) algorithm based on QR-decomposition recursive least squares (QRRLS) approach for online speech dereverberation, in which a time-varying forgetting factor (VFF) control scheme is devised to adapt to dynamic acoustic scenarios. Being capable of avoiding the numerical instability problem inherent to RLS-based MCLP, QRRLS-based MCLP method shows more robustness while retains the same arithmetical complexity and fast convergence as the RLS-based methods. The VFF scheme based on the approximated derivatives of the filter coefficients is adopted to update the time-wise forgetting factor which can track the varying paths of reflections effectively. Experimental results show that the proposed VFF-QRRLS-based MCLP algorithm improves the performance of speech dereverberation and also enjoys a fast tracking capability and numerical robustness compared with the conventional adaptive MCLP algorithms.
本文提出了一种基于qr分解递归最小二乘(QRRLS)方法的自适应多通道线性预测(MCLP)算法,用于在线语音去噪,其中设计了时变遗忘因子(VFF)控制方案以适应动态声学场景。基于qrrls的MCLP方法能够避免基于rls的MCLP方法固有的数值不稳定性问题,在保持与基于rls的方法相同的算法复杂度和快速收敛性的同时,具有更强的鲁棒性。采用基于滤波系数近似导数的VFF方案更新随时间遗忘因子,能有效跟踪反射的变化路径。实验结果表明,与传统的自适应MCLP算法相比,提出的基于vff - qrrls的MCLP算法不仅提高了语音去噪性能,而且具有快速跟踪能力和数值鲁棒性。
{"title":"A Time-Varying Forgetting Factor-Based QRRLS Algorithm for Multichannel Speech Dereverberation","authors":"Xinyu Tang, Yang Xu, Rilin Chen, Yi Zhou","doi":"10.1109/ISSPIT51521.2020.9408971","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408971","url":null,"abstract":"In this paper, we propose an adaptive multichannel linear prediction (MCLP) algorithm based on QR-decomposition recursive least squares (QRRLS) approach for online speech dereverberation, in which a time-varying forgetting factor (VFF) control scheme is devised to adapt to dynamic acoustic scenarios. Being capable of avoiding the numerical instability problem inherent to RLS-based MCLP, QRRLS-based MCLP method shows more robustness while retains the same arithmetical complexity and fast convergence as the RLS-based methods. The VFF scheme based on the approximated derivatives of the filter coefficients is adopted to update the time-wise forgetting factor which can track the varying paths of reflections effectively. Experimental results show that the proposed VFF-QRRLS-based MCLP algorithm improves the performance of speech dereverberation and also enjoys a fast tracking capability and numerical robustness compared with the conventional adaptive MCLP algorithms.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126065900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Aging Estimation of an AC Adapter from Generated Electromagnetic Noise 电磁噪声对交流适配器老化的影响
Pub Date : 2020-12-09 DOI: 10.1109/ISSPIT51521.2020.9408685
F. Ishiyama, Y. Toriumi
Capacitors are the parts of a power supply unit that deteriorate most easily. Among types of power supply unit, AC adapters are the ones for which it is not possible to check the leakage or bulging of capacitors, because they are sealed and invisible. Therefore, we focused on the electromagnetic noise which deteriorated AC adapters emit on the power line. We measured their noise and analyzed them with our own method of mode decomposition. It was found that the intensity of the noise is proportional to the internal resistance of the deteriorated capacitors measured in the hot condition.
电容器是电源单元中最容易损坏的部件。在各种类型的电源单元中,交流适配器是无法检查电容器泄漏或膨胀的,因为它们是密封的,看不见的。因此,我们重点研究了变质的交流适配器在电源线上发出的电磁噪声。我们测量了它们的噪声,并用我们自己的模态分解方法对它们进行了分析。结果表明,在高温条件下,噪声强度与劣化电容器的内阻成正比。
{"title":"Aging Estimation of an AC Adapter from Generated Electromagnetic Noise","authors":"F. Ishiyama, Y. Toriumi","doi":"10.1109/ISSPIT51521.2020.9408685","DOIUrl":"https://doi.org/10.1109/ISSPIT51521.2020.9408685","url":null,"abstract":"Capacitors are the parts of a power supply unit that deteriorate most easily. Among types of power supply unit, AC adapters are the ones for which it is not possible to check the leakage or bulging of capacitors, because they are sealed and invisible. Therefore, we focused on the electromagnetic noise which deteriorated AC adapters emit on the power line. We measured their noise and analyzed them with our own method of mode decomposition. It was found that the intensity of the noise is proportional to the internal resistance of the deteriorated capacitors measured in the hot condition.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1