首页 > 最新文献

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
An Improved Doa Estimator Based on Partial Relaxation Approach 基于部分松弛法的改进Doa估计方法
Minh Trinh-Hoang, M. Viberg, M. Pesavento
In the partial relaxation approach, at each desired direction, the manifold structure of the remaining interfering signals impinging on the sensor array is relaxed, which results in closed form estimates for the interference parameters. By adopting this approach, in this paper, a new estimator based on the unconstrained covariance fitting problem is proposed. To obtain the null-spectra efficiently, an iterative rooting scheme based on the rational function approximation is applied. Simulation results show that the performance of the proposed estimator is superior to the classical and other partial relaxation methods, especially in the case of low number of snapshots, irrespectively of any specific structure of the sensor array while maintaining a reasonable computational cost.
在部分松弛方法中,在每个期望的方向上,剩余干扰信号的流形结构被松弛,从而得到干扰参数的封闭估计。利用这种方法,本文提出了一种新的基于无约束协方差拟合问题的估计量。为了有效地获得零谱,采用了一种基于有理函数近似的迭代生根格式。仿真结果表明,该估计方法的性能优于传统的部分松弛方法,特别是在快照数量较少的情况下,与传感器阵列的任何特定结构无关,同时保持合理的计算成本。
{"title":"An Improved Doa Estimator Based on Partial Relaxation Approach","authors":"Minh Trinh-Hoang, M. Viberg, M. Pesavento","doi":"10.1109/ICASSP.2018.8462295","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462295","url":null,"abstract":"In the partial relaxation approach, at each desired direction, the manifold structure of the remaining interfering signals impinging on the sensor array is relaxed, which results in closed form estimates for the interference parameters. By adopting this approach, in this paper, a new estimator based on the unconstrained covariance fitting problem is proposed. To obtain the null-spectra efficiently, an iterative rooting scheme based on the rational function approximation is applied. Simulation results show that the performance of the proposed estimator is superior to the classical and other partial relaxation methods, especially in the case of low number of snapshots, irrespectively of any specific structure of the sensor array while maintaining a reasonable computational cost.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"3246-3250"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90724756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Time-Varying Delay Estimation Using Common Local All-Pass Filters with Application to Surface Electromyography 基于局部全通滤波器的时变延迟估计及其在表面肌电图中的应用
Christopher Gilliam, Adrian Bingham, T. Blu, B. Jelfs
Estimation of conduction velocity (CV) is an important task in the analysis of surface electromyography (sEMG). The problem can be framed as estimation of a time-varying delay (TVD) between electrode recordings. In this paper we present an algorithm which incorporates information from multiple electrodes into a single TVD estimation. The algorithm uses a common all-pass filter to relate two groups of signals at a local level. We also address a current limitation of CV estimators by providing an automated way of identifying the innervation zone from a set of electrode recordings, thus allowing incorporation of the entire array into the estimation. We validate the algorithm on both synthetic and real sEMG data with results showing the proposed algorithm is both robust and accurate.
传导速度(CV)的估计是表面肌电图分析中的一项重要任务。这个问题可以被框定为电极记录之间的时变延迟(TVD)的估计。在本文中,我们提出了一种将多个电极的信息合并到单个TVD估计中的算法。该算法使用一个通用的全通滤波器在局部电平将两组信号关联起来。我们还通过提供一种从一组电极记录中识别神经支配区的自动方法来解决当前CV估计器的局限性,从而允许将整个阵列纳入估计。我们在合成和真实表面肌电信号数据上验证了该算法,结果表明该算法具有鲁棒性和准确性。
{"title":"Time-Varying Delay Estimation Using Common Local All-Pass Filters with Application to Surface Electromyography","authors":"Christopher Gilliam, Adrian Bingham, T. Blu, B. Jelfs","doi":"10.1109/ICASSP.2018.8461390","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461390","url":null,"abstract":"Estimation of conduction velocity (CV) is an important task in the analysis of surface electromyography (sEMG). The problem can be framed as estimation of a time-varying delay (TVD) between electrode recordings. In this paper we present an algorithm which incorporates information from multiple electrodes into a single TVD estimation. The algorithm uses a common all-pass filter to relate two groups of signals at a local level. We also address a current limitation of CV estimators by providing an automated way of identifying the innervation zone from a set of electrode recordings, thus allowing incorporation of the entire array into the estimation. We validate the algorithm on both synthetic and real sEMG data with results showing the proposed algorithm is both robust and accurate.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"841-845"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79272251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Deeper Look at Gaussian Mixture Model Based Anti-Spoofing Systems 基于高斯混合模型的反欺骗系统的深入研究
Bhusan Chettri, Bob L. Sturm
A “replay attack” involves replaying pre-recorded speech of an enrolled speaker to bypass an automatic speaker verification system. The 2017 ASVspoof Challenge focused on this kind of attack. In this paper, we describe our evaluation work after this challenge. First, we study the effectiveness of Gaussian Mixture Model (GMM) systems using six different hand-crafted features for detecting a replay attack. Second, we take a deeper look at these GMM systems and perform a frame-level analysis of log likelihoods. Our analysis shows how system performance can depend on a simple class-dependent cue in the dataset: initial silence frames of zeros appear in the genuine signals but missing in the spoofed version. Third, we show how we can fool these systems using this cue. For example, we find the equal error rate (EER) of one GMM system dramatically rises from 14.82 to 44.44 when we add the cue to the evaluation data. Finally, we explore whether this problem can be mitigated by pre-processing the 2017 ASV spoof Challenge dataset.
“重放攻击”涉及重放预先录制的已注册演讲者的演讲,以绕过自动扬声器验证系统。2017年ASVspoof挑战赛关注的就是这种攻击。在本文中,我们描述了在这一挑战之后我们的评估工作。首先,我们研究了高斯混合模型(GMM)系统使用六种不同的手工特征来检测重放攻击的有效性。其次,我们将更深入地研究这些GMM系统,并对日志可能性执行帧级分析。我们的分析显示了系统性能如何依赖于数据集中一个简单的类相关提示:零的初始沉默帧出现在真实信号中,但在欺骗版本中却没有。第三,我们展示了如何使用这个线索来欺骗这些系统。例如,当我们在评估数据中添加线索时,我们发现一个GMM系统的相等错误率(EER)从14.82急剧上升到44.44。最后,我们探讨了是否可以通过预处理2017年ASV欺骗挑战数据集来缓解这个问题。
{"title":"A Deeper Look at Gaussian Mixture Model Based Anti-Spoofing Systems","authors":"Bhusan Chettri, Bob L. Sturm","doi":"10.1109/ICASSP.2018.8461467","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461467","url":null,"abstract":"A “replay attack” involves replaying pre-recorded speech of an enrolled speaker to bypass an automatic speaker verification system. The 2017 ASVspoof Challenge focused on this kind of attack. In this paper, we describe our evaluation work after this challenge. First, we study the effectiveness of Gaussian Mixture Model (GMM) systems using six different hand-crafted features for detecting a replay attack. Second, we take a deeper look at these GMM systems and perform a frame-level analysis of log likelihoods. Our analysis shows how system performance can depend on a simple class-dependent cue in the dataset: initial silence frames of zeros appear in the genuine signals but missing in the spoofed version. Third, we show how we can fool these systems using this cue. For example, we find the equal error rate (EER) of one GMM system dramatically rises from 14.82 to 44.44 when we add the cue to the evaluation data. Finally, we explore whether this problem can be mitigated by pre-processing the 2017 ASV spoof Challenge dataset.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"5159-5163"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77784942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Adaptive Coding of Non-Negative Factorization Parameters with Application to Informed Source Separation 非负分解参数的自适应编码及其在知情源分离中的应用
Max Bläser, Christian Rohlfing, Yingbo Gao, M. Wien
Informed source separation (ISS) uses source separation for extracting audio objects out of their downmix given some pre-computed parameters. In recent years, non-negative tensor factorization (NTF) has proven to be a good choice for compressing audio objects at an encoding stage. At the decoding stage, these parameters are used to separate the downmix with Wiener-filtering. The quantized NTF parameters have to be encoded to a bit stream prior to transmission. In this paper, we propose to use context-based adaptive binary arithmetic coding (CABAC) for this task. CABAC is widely used in the video coding community and exploits local signal statistics. We adapt CABAC to the task of NTF-based ISS and show that our contribution outperforms reference coding methods.
信息源分离(ISS)使用源分离从下混音中提取音频对象,给出一些预先计算的参数。近年来,非负张量分解(NTF)被证明是在编码阶段压缩音频对象的一个很好的选择。在解码阶段,使用这些参数与维纳滤波分离下混音。在传输之前,量化的NTF参数必须被编码成比特流。在本文中,我们建议使用基于上下文的自适应二进制算术编码(CABAC)来完成这项任务。CABAC在视频编码界得到了广泛的应用,它利用了局部信号的统计特性。我们将CABAC用于基于ntf的ISS任务,并表明我们的贡献优于参考编码方法。
{"title":"Adaptive Coding of Non-Negative Factorization Parameters with Application to Informed Source Separation","authors":"Max Bläser, Christian Rohlfing, Yingbo Gao, M. Wien","doi":"10.1109/ICASSP.2018.8462584","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462584","url":null,"abstract":"Informed source separation (ISS) uses source separation for extracting audio objects out of their downmix given some pre-computed parameters. In recent years, non-negative tensor factorization (NTF) has proven to be a good choice for compressing audio objects at an encoding stage. At the decoding stage, these parameters are used to separate the downmix with Wiener-filtering. The quantized NTF parameters have to be encoded to a bit stream prior to transmission. In this paper, we propose to use context-based adaptive binary arithmetic coding (CABAC) for this task. CABAC is widely used in the video coding community and exploits local signal statistics. We adapt CABAC to the task of NTF-based ISS and show that our contribution outperforms reference coding methods.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"751-755"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87567219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Hybrid Neural Network Based on the Duplex Model of Pitch Perception for Singing Melody Extraction 基于二元音高感知模型的混合神经网络用于歌唱旋律提取
Hsin Chou, Ming-Tso Chen, T. Chi
In this paper, we build up a hybrid neural network (NN) for singing melody extraction from polyphonic music by imitating human pitch perception. For human hearing, there are two pitch perception models, the spectral model and the temporal model, in accordance with whether harmonics are resolved or not. Here, we first use NNs to implement individual models and evaluate their performance in the task of singing melody extraction. Then, we combine the NNs to constitute the composite NN to simulate the duplex model, which complements the pitch perception from unresolved harmonics of the spectral model using the temporal model. Simulation results show the proposed composite NN outperforms other conventional methods in singing melody extraction.
本文通过模拟人的音高感知,建立了一种用于从复调音乐中提取歌唱旋律的混合神经网络。对于人类的听觉来说,根据是否分辨谐波,有两种音高感知模型,即频谱模型和时间模型。在这里,我们首先使用神经网络来实现单个模型,并评估它们在歌曲旋律提取任务中的表现。然后,我们将这些神经网络组合成复合神经网络来模拟双工模型,该模型利用时间模型补充了频谱模型中未解析谐波的音高感知。仿真结果表明,本文提出的复合神经网络在歌曲旋律提取方面优于其他传统方法。
{"title":"A Hybrid Neural Network Based on the Duplex Model of Pitch Perception for Singing Melody Extraction","authors":"Hsin Chou, Ming-Tso Chen, T. Chi","doi":"10.1109/ICASSP.2018.8461483","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461483","url":null,"abstract":"In this paper, we build up a hybrid neural network (NN) for singing melody extraction from polyphonic music by imitating human pitch perception. For human hearing, there are two pitch perception models, the spectral model and the temporal model, in accordance with whether harmonics are resolved or not. Here, we first use NNs to implement individual models and evaluate their performance in the task of singing melody extraction. Then, we combine the NNs to constitute the composite NN to simulate the duplex model, which complements the pitch perception from unresolved harmonics of the spectral model using the temporal model. Simulation results show the proposed composite NN outperforms other conventional methods in singing melody extraction.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"119 1","pages":"381-385"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80407476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A 203 FPS VLSI Architecture of Improved Dense Trajectories for Real-Time Human Action Recognition 一种用于实时人体动作识别的改进密集轨迹的203fps VLSI架构
Zhi-Yi Lin, Jia-Lin Chen, Liang-Gee Chen
This paper introduces architecture with high throughput, low on-chip memory, and efficient data access for Improved Dense Trajectories (iDT) as video representations for realtime action recognition. The iDT feature can capture longterm motion cues better than any existing deep feature, which makes it crucial in state-of-the-art action recognition systems. There are three major features in our architecture design, including a low bandwidth frame-wise feature extraction, low on-chip memory architecture for point tracking, and two-stage trajectory pruning architecture for low bandwidth. Using TSMC 40nm technology, our chip area is 3.1 mm2, and the size of on-chip memory is 40.8 kB. The chip can support videos in resolution of 320×240 with a throughput of 203 fps under 215 MHz, which is a 81.2 times speedup compared with CPU. Under the same operating frequency, it can also provide feature extraction for six windows of size 320 × 240 in higher resolution videos with a throughput of 34 fps.
本文介绍了一种具有高吞吐量、低片上内存和高效数据访问的架构,用于改进密集轨迹(iDT)作为实时动作识别的视频表示。iDT特征可以比任何现有的深度特征更好地捕捉长期运动线索,这使得它在最先进的动作识别系统中至关重要。我们的架构设计有三个主要特征,包括低带宽逐帧特征提取、用于点跟踪的低片上存储架构和用于低带宽的两阶段轨迹修剪架构。采用台积电40nm工艺,芯片面积为3.1 mm2,片上存储器大小为40.8 kB。该芯片可以支持分辨率为320×240的视频,在215 MHz下的吞吐量为203 fps,与CPU相比速度提高了81.2倍。在相同的工作频率下,它还可以在更高分辨率的视频中对大小为320 × 240的6个窗口进行特征提取,吞吐量为34 fps。
{"title":"A 203 FPS VLSI Architecture of Improved Dense Trajectories for Real-Time Human Action Recognition","authors":"Zhi-Yi Lin, Jia-Lin Chen, Liang-Gee Chen","doi":"10.1109/ICASSP.2018.8461988","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461988","url":null,"abstract":"This paper introduces architecture with high throughput, low on-chip memory, and efficient data access for Improved Dense Trajectories (iDT) as video representations for realtime action recognition. The iDT feature can capture longterm motion cues better than any existing deep feature, which makes it crucial in state-of-the-art action recognition systems. There are three major features in our architecture design, including a low bandwidth frame-wise feature extraction, low on-chip memory architecture for point tracking, and two-stage trajectory pruning architecture for low bandwidth. Using TSMC 40nm technology, our chip area is 3.1 mm2, and the size of on-chip memory is 40.8 kB. The chip can support videos in resolution of 320×240 with a throughput of 203 fps under 215 MHz, which is a 81.2 times speedup compared with CPU. Under the same operating frequency, it can also provide feature extraction for six windows of size 320 × 240 in higher resolution videos with a throughput of 34 fps.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"1115-1119"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80143035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generative Auditory Model Embedded Neural Network for Speech Processing 基于生成听觉模型的嵌入式神经网络语音处理
Yu-Wen Lo, Yih-Liang Shen, Y. Liao, T. Chi
Before the era of the neural network (NN), features extracted from auditory models have been applied to various speech applications and been demonstrated more robust against noise than conventional speech-processing features. What's the role of auditory models in the current NN era? Are they obsolete? To answer this question, we construct a NN with a generative auditory model embedded to process speech signals. The generative auditory model consists of two stages, the stage of spectrum estimation in the logarithmic-frequency axis by the cochlea and the stage of spectral-temporal analysis in the modulation domain by the auditory cortex. The NN is evaluated in a simple speaker identification task. Experiment results show that the auditory model embedded NN is still more robust against noise, especially in low SNR conditions, than the randomly-initialized NN in speaker identification.
在神经网络(NN)时代之前,从听觉模型中提取的特征已经应用于各种语音应用,并且被证明比传统的语音处理特征对噪声更具鲁棒性。在当前的神经网络时代,听觉模型的作用是什么?它们过时了吗?为了回答这个问题,我们构建了一个嵌入了生成听觉模型的神经网络来处理语音信号。生成式听觉模型包括两个阶段,即耳蜗在对数频率轴上的频谱估计阶段和听觉皮层在调制域的频谱时间分析阶段。在一个简单的说话人识别任务中对神经网络进行评估。实验结果表明,在低信噪比条件下,嵌入听觉模型的神经网络在说话人识别方面仍然比随机初始化神经网络具有更强的鲁棒性。
{"title":"A Generative Auditory Model Embedded Neural Network for Speech Processing","authors":"Yu-Wen Lo, Yih-Liang Shen, Y. Liao, T. Chi","doi":"10.1109/ICASSP.2018.8462690","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462690","url":null,"abstract":"Before the era of the neural network (NN), features extracted from auditory models have been applied to various speech applications and been demonstrated more robust against noise than conventional speech-processing features. What's the role of auditory models in the current NN era? Are they obsolete? To answer this question, we construct a NN with a generative auditory model embedded to process speech signals. The generative auditory model consists of two stages, the stage of spectrum estimation in the logarithmic-frequency axis by the cochlea and the stage of spectral-temporal analysis in the modulation domain by the auditory cortex. The NN is evaluated in a simple speaker identification task. Experiment results show that the auditory model embedded NN is still more robust against noise, especially in low SNR conditions, than the randomly-initialized NN in speaker identification.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"5179-5183"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78073383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Asynchronous Power Iteration: A Graph Signal Perspective 异步功率迭代:图信号视角
Oguzhan Teke, P. Vaidyanathan
This paper considers an autonomous network in which the nodes communicate only with their neighbors at random time instances, repeatedly and independently. Polynomial graph filters studied in the context of graph signal processing are inadequate to analyze signals on this type of networks. This is due to the fact that the basic shift on a graph requires all the nodes to communicate at the same time, which cannot be assumed in an autonomous setting. In order to analyze these type of networks, this paper studies an asynchronous power iteration that updates the values of only a subset of nodes. This paper further reveals the close connection between asynchronous updates and the notion of smooth signals on the graph. The paper also shows that a cascade of random asynchronous updates smooths out any arbitrary signal on the graph.
本文研究了一种节点在随机时间实例中只与相邻节点进行重复、独立通信的自治网络。在图信号处理的背景下研究的多项式图滤波器不足以分析这类网络上的信号。这是因为图上的基本移动需要所有节点同时通信,这在自治设置中是无法假设的。为了分析这些类型的网络,本文研究了一种异步功率迭代,该迭代只更新一部分节点的值。本文进一步揭示了异步更新与图上平滑信号的概念之间的密切联系。本文还证明了随机异步更新的级联平滑了图上的任意信号。
{"title":"The Asynchronous Power Iteration: A Graph Signal Perspective","authors":"Oguzhan Teke, P. Vaidyanathan","doi":"10.1109/ICASSP.2018.8461872","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461872","url":null,"abstract":"This paper considers an autonomous network in which the nodes communicate only with their neighbors at random time instances, repeatedly and independently. Polynomial graph filters studied in the context of graph signal processing are inadequate to analyze signals on this type of networks. This is due to the fact that the basic shift on a graph requires all the nodes to communicate at the same time, which cannot be assumed in an autonomous setting. In order to analyze these type of networks, this paper studies an asynchronous power iteration that updates the values of only a subset of nodes. This paper further reveals the close connection between asynchronous updates and the notion of smooth signals on the graph. The paper also shows that a cascade of random asynchronous updates smooths out any arbitrary signal on the graph.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 2","pages":"4059-4063"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72631210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multi-Scale Object Detection with Feature Fusion and Region Objectness Network 基于特征融合和区域目标网络的多尺度目标检测
W. Guan, Yuexian Zou, Xiaoqun Zhou
Though tremendous progresses have been made in object detection due to the deep convolutional networks, one of the remaining challenges is the multi-scale object detection(MOD). To improve the performance of MOD task, we take Faster region-based CNN (Faster R-CNN) framework and work on two specific problems: get more accurate localization for small objects and eliminate background region proposals, when there are many small objects exist. Specifically, a feature fusion module is introduced which jointly utilize the high-abstracted semantic knowledge captured in higher layer and details information captured in the lower layer to generate a fine resolution feature maps. As a result, the small objects can be localized more accurately. Besides, a novel Region Objectness Network is developed for generating effective proposals which are more likely to cover the target objects. Extensive experiments have been conducted over UA-DETRAC car datasets, as well as a self-built bird dataset (BSBDV 2017) collected from Shenzhen Bay coastal wetland, which demonstrate the competitive performance and the comparable detection speed of our proposed method.
虽然深度卷积网络在目标检测方面取得了巨大的进步,但多尺度目标检测(MOD)仍然是一个挑战。为了提高MOD任务的性能,我们采用Faster region-based CNN (Faster R-CNN)框架,在小目标多的情况下,对小目标进行更精确的定位和消除背景区域建议两个具体问题进行了研究。具体来说,引入特征融合模块,将高层捕获的高度抽象的语义知识和低层捕获的细节信息结合起来,生成精细分辨率的特征地图。因此,可以更准确地定位小物体。此外,为了生成更有可能覆盖目标对象的有效建议,还开发了一种新的区域目标网络。在UA-DETRAC汽车数据集以及深圳湾滨海湿地自建鸟类数据集(BSBDV 2017)上进行了大量实验,证明了我们提出的方法具有竞争力的性能和相当的检测速度。
{"title":"Multi-Scale Object Detection with Feature Fusion and Region Objectness Network","authors":"W. Guan, Yuexian Zou, Xiaoqun Zhou","doi":"10.1109/ICASSP.2018.8461523","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461523","url":null,"abstract":"Though tremendous progresses have been made in object detection due to the deep convolutional networks, one of the remaining challenges is the multi-scale object detection(MOD). To improve the performance of MOD task, we take Faster region-based CNN (Faster R-CNN) framework and work on two specific problems: get more accurate localization for small objects and eliminate background region proposals, when there are many small objects exist. Specifically, a feature fusion module is introduced which jointly utilize the high-abstracted semantic knowledge captured in higher layer and details information captured in the lower layer to generate a fine resolution feature maps. As a result, the small objects can be localized more accurately. Besides, a novel Region Objectness Network is developed for generating effective proposals which are more likely to cover the target objects. Extensive experiments have been conducted over UA-DETRAC car datasets, as well as a self-built bird dataset (BSBDV 2017) collected from Shenzhen Bay coastal wetland, which demonstrate the competitive performance and the comparable detection speed of our proposed method.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"2596-2600"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90483137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CBLDNN-Based Speaker-Independent Speech Separation Via Generative Adversarial Training 基于生成对抗训练的cbldnn独立说话人语音分离
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu
In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feedforward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner. Thus, the information that contributes to separating speech and improving speech quality is integrated into the model. We execute GAT throughout the training, which makes the separated speech indistinguishable from the real one. We evaluate CBLDNN-GAT on WSJ0-2mix dataset. The experimental results show that the proposed model achieves 11.0d-B signal-to-distortion ratio (SDR) improvement, which is the new state-of-the-art result.
本文提出了一种基于卷积、双向长短期记忆、深度前馈神经网络(CBLDNN)和生成对抗训练(GAT)的独立于说话人的多说话人单耳语音分离系统(CBLDNN-GAT)。我们的系统旨在获得更好的语音质量,而不仅仅是最小化均方误差(MSE)。在初始阶段,我们利用对数滤波器组和音调特征以多任务方式预热我们的CBLDNN。因此,将有助于分离语音和提高语音质量的信息集成到模型中。我们在整个训练过程中执行GAT,使分离的语音与真实语音无法区分。我们在WSJ0-2mix数据集上对CBLDNN-GAT进行了评估。实验结果表明,该模型的信号失真比(SDR)提高了11.0d-B,是最新的研究成果。
{"title":"CBLDNN-Based Speaker-Independent Speech Separation Via Generative Adversarial Training","authors":"Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu","doi":"10.1109/ICASSP.2018.8462505","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462505","url":null,"abstract":"In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feedforward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner. Thus, the information that contributes to separating speech and improving speech quality is integrated into the model. We execute GAT throughout the training, which makes the separated speech indistinguishable from the real one. We evaluate CBLDNN-GAT on WSJ0-2mix dataset. The experimental results show that the proposed model achieves 11.0d-B signal-to-distortion ratio (SDR) improvement, which is the new state-of-the-art result.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"711-715"},"PeriodicalIF":0.0,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87784806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
期刊
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1