首页 > 最新文献

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Detecting Adversarial Attacks In Time-Series Data 检测时间序列数据中的对抗性攻击
Mubarak G. Abdu-Aguye, W. Gomaa, Yasushi Makihara, Y. Yagi
In recent times, deep neural networks have seen increased adoption in highly critical tasks. They are also susceptible to adversarial attacks, which are specifically crafted changes made to input samples which lead to erroneous output from such models. Such attacks have been shown to affect different types of data such as images and more recently, time-series data. Such susceptibility could have catastrophic consequences, depending on the domain.We propose a method for detecting Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) adversarial attacks as adapted for time-series data. We frame the problem as an instance of outlier detection and construct a normalcy model based on information and chaos-theoretic measures, which can then be used to determine whether unseen samples are normal or adversarial. Our approach shows promising performance on several datasets from the 2015 UCR Time Series Archive, reaching up to 97% detection accuracy in the best case.
近年来,深度神经网络在高度关键的任务中得到越来越多的应用。它们也容易受到对抗性攻击,对抗性攻击是对输入样本进行的精心设计的更改,从而导致此类模型的错误输出。这种攻击已经被证明可以影响不同类型的数据,比如图像,最近还可以影响时间序列数据。这种易感性可能会产生灾难性的后果,这取决于领域。我们提出了一种检测快速梯度符号法(FGSM)和基本迭代法(BIM)对抗性攻击的方法,该方法适用于时间序列数据。我们将该问题作为离群值检测的一个实例,并基于信息和混沌理论度量构建了一个正态模型,该模型可用于确定看不见的样本是正常的还是敌对的。我们的方法在2015年UCR时间序列存档的几个数据集上显示出了很好的性能,在最好的情况下达到了97%的检测准确率。
{"title":"Detecting Adversarial Attacks In Time-Series Data","authors":"Mubarak G. Abdu-Aguye, W. Gomaa, Yasushi Makihara, Y. Yagi","doi":"10.1109/ICASSP40776.2020.9053311","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053311","url":null,"abstract":"In recent times, deep neural networks have seen increased adoption in highly critical tasks. They are also susceptible to adversarial attacks, which are specifically crafted changes made to input samples which lead to erroneous output from such models. Such attacks have been shown to affect different types of data such as images and more recently, time-series data. Such susceptibility could have catastrophic consequences, depending on the domain.We propose a method for detecting Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) adversarial attacks as adapted for time-series data. We frame the problem as an instance of outlier detection and construct a normalcy model based on information and chaos-theoretic measures, which can then be used to determine whether unseen samples are normal or adversarial. Our approach shows promising performance on several datasets from the 2015 UCR Time Series Archive, reaching up to 97% detection accuracy in the best case.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"3092-3096"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87766345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Channel Charting: an Euclidean Distance Matrix Completion Perspective 通道图表:欧几里得距离矩阵完成视角
Patrick Agostini, Z. Utkovski, S. Stańczak
Channel charting (CC) is an emerging machine learning framework that aims at learning lower-dimensional representations of the radio geometry from collected channel state information (CSI) in an area of interest, such that spatial relations of the representations in the different domains are preserved. Extracting features capable of correctly representing spatial properties between positions is crucial for learning reliable channel charts. Most approaches to CC in the literature rely on range distance estimates, which have the drawback that they only provide accurate distance information for colinear positions. Distances between positions with large azimuth separation are constantly underestimated using these approaches, and thus incorrectly mapped to close neighborhoods. In this paper, we introduce a correlation matrix distance (CMD) based dissimilarity measure for CC that allows us to group CSI measurements according to their co-linearity. This provides us with the capability to discard points for which large distance errors are made, and to build a neighborhood graph between approximately collinear positions. The neighborhood graph allows us to state the problem of CC as an instance of an Euclidean distance matrix completion (EDMC) problem where side-information can be naturally introduced via convex box-constraints.
信道制图(CC)是一种新兴的机器学习框架,旨在从感兴趣的区域收集的信道状态信息(CSI)中学习无线电几何的低维表示,从而保留不同域中表示的空间关系。提取能够正确表示位置之间空间属性的特征对于学习可靠的通道图至关重要。文献中大多数CC方法依赖于距离估计,其缺点是它们仅为共线位置提供准确的距离信息。使用这些方法,具有大方位角间隔的位置之间的距离经常被低估,因此被错误地映射到附近的区域。在本文中,我们引入了一种基于相关矩阵距离(CMD)的CC不相似度量,使我们能够根据CSI测量的共线性对它们进行分组。这为我们提供了丢弃造成较大距离误差的点的能力,并在近似共线位置之间建立一个邻域图。邻域图允许我们将CC问题描述为欧几里得距离矩阵补全(EDMC)问题的一个实例,其中可以通过凸盒约束自然地引入边信息。
{"title":"Channel Charting: an Euclidean Distance Matrix Completion Perspective","authors":"Patrick Agostini, Z. Utkovski, S. Stańczak","doi":"10.1109/ICASSP40776.2020.9053639","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053639","url":null,"abstract":"Channel charting (CC) is an emerging machine learning framework that aims at learning lower-dimensional representations of the radio geometry from collected channel state information (CSI) in an area of interest, such that spatial relations of the representations in the different domains are preserved. Extracting features capable of correctly representing spatial properties between positions is crucial for learning reliable channel charts. Most approaches to CC in the literature rely on range distance estimates, which have the drawback that they only provide accurate distance information for colinear positions. Distances between positions with large azimuth separation are constantly underestimated using these approaches, and thus incorrectly mapped to close neighborhoods. In this paper, we introduce a correlation matrix distance (CMD) based dissimilarity measure for CC that allows us to group CSI measurements according to their co-linearity. This provides us with the capability to discard points for which large distance errors are made, and to build a neighborhood graph between approximately collinear positions. The neighborhood graph allows us to state the problem of CC as an instance of an Euclidean distance matrix completion (EDMC) problem where side-information can be naturally introduced via convex box-constraints.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"5010-5014"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88311828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov Iteration 基于块Krylov迭代的快速准确的低秩逼近频繁方向算法
Qianxin Yi, Chenhao Wang, Xiuwu Liao, Yao Wang
It is known that frequent directions (FD) is a popular deterministic matrix sketching technique for low rank approximation. However, FD and its randomized variants usually meet high computational cost or computational instability in dealing with large-scale datasets, which limits their use in practice. To remedy such issues, this paper aims at improving the efficiency and effectiveness of FD. Specifically, by utilizing the power of Block Krylov Iteration and count sketch techniques, we propose a fast and accurate FD algorithm dubbed as BKICS-FD. We derive the error bound of the proposed BKICS-FD and then carry out extensive numerical experiments to illustrate its superiority over several popular FD algorithms, both in terms of computational speed and accuracy.
频繁方向(FD)是一种常用的求解低秩近似的确定性矩阵素描技术。然而,FD及其随机化变体在处理大规模数据集时往往存在计算成本高或计算不稳定的问题,限制了其在实际中的应用。为了解决这些问题,本文旨在提高FD的效率和有效性。具体而言,利用块克雷洛夫迭代和计数草图技术的力量,我们提出了一种快速准确的FD算法,称为BKICS-FD。我们推导了所提出的BKICS-FD的误差界,然后进行了大量的数值实验,以说明它在计算速度和精度方面优于几种流行的FD算法。
{"title":"A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov Iteration","authors":"Qianxin Yi, Chenhao Wang, Xiuwu Liao, Yao Wang","doi":"10.1109/ICASSP40776.2020.9054022","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054022","url":null,"abstract":"It is known that frequent directions (FD) is a popular deterministic matrix sketching technique for low rank approximation. However, FD and its randomized variants usually meet high computational cost or computational instability in dealing with large-scale datasets, which limits their use in practice. To remedy such issues, this paper aims at improving the efficiency and effectiveness of FD. Specifically, by utilizing the power of Block Krylov Iteration and count sketch techniques, we propose a fast and accurate FD algorithm dubbed as BKICS-FD. We derive the error bound of the proposed BKICS-FD and then carry out extensive numerical experiments to illustrate its superiority over several popular FD algorithms, both in terms of computational speed and accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"3167-3171"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86368369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Lattice Search for Speech Recognition 语音识别的神经格搜索
Rao Ma, Hao Li, Qi Liu, Lu Chen, Kai Yu
To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by the second-pass model to perform rescoring. Currently, the most popular rescoring methods are N-best rescoring and lattice rescoring with long short-term memory language models (LSTMLMs). However, these methods encounter the problem of limited search space or inconsistency between training and evaluation. In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. Our model is composed of a bidirectional LatticeLSTM encoder followed by an attentional LSTM decoder. The model takes word lattice as input and generates the single best hypothesis from the given lattice space. When combined with an LSTMLM, the proposed model yields 9.7% and 7.5% relative WER reduction compared to N-best rescoring methods and lattice rescoring methods within the same amount of decoding time.
为了提高自动语音识别的准确率,人们普遍采用双路解码策略。第一遍模型生成紧凑的词格,第二遍模型利用这些词格进行评分。目前,最流行的评分方法是长短期记忆语言模型(lstmlm)的N-best评分和点阵评分。然而,这些方法遇到了搜索空间有限或训练与评估不一致的问题。在本文中,我们用一个端到端模型来解决这些问题,以准确地从词格中提取最佳假设。我们的模型由一个双向的LatticeLSTM编码器和一个注意LSTM解码器组成。该模型以词格为输入,从给定的格空间中生成单个最优假设。当与LSTMLM相结合时,在相同的解码时间内,与N-best评分方法和晶格评分方法相比,所提出的模型的相对WER降低了9.7%和7.5%。
{"title":"Neural Lattice Search for Speech Recognition","authors":"Rao Ma, Hao Li, Qi Liu, Lu Chen, Kai Yu","doi":"10.1109/ICASSP40776.2020.9054109","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054109","url":null,"abstract":"To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by the second-pass model to perform rescoring. Currently, the most popular rescoring methods are N-best rescoring and lattice rescoring with long short-term memory language models (LSTMLMs). However, these methods encounter the problem of limited search space or inconsistency between training and evaluation. In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. Our model is composed of a bidirectional LatticeLSTM encoder followed by an attentional LSTM decoder. The model takes word lattice as input and generates the single best hypothesis from the given lattice space. When combined with an LSTMLM, the proposed model yields 9.7% and 7.5% relative WER reduction compared to N-best rescoring methods and lattice rescoring methods within the same amount of decoding time.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"7794-7798"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86380228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DNN-Based Speech Recognition for Globalphone Languages 基于dnn的全球电话语言语音识别
Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, S. Abate, Tanja Schultz
This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. GP is a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in more than 20 languages. Moreover, we provide new results for five additional languages, namely, Amharic, Oromo, Tigrigna, Wolaytta, and Uyghur. Across the 22 languages considered, the hybrid HMM-DNN models outperform the HMM-GMM based models regardless of the size of the training speech used. Overall, we achieved relative improvements that range from 7.14% to 59.43%.
本文描述了基于混合隐马尔可夫模型和深度神经网络(HMM-DNN)的GlobalPhone (GP)多语言文本和语音数据库的新的参考基准测试结果。GP是一个高质量的多语种阅读语音数据库,具有20多种语言的相应转录和发音字典。此外,我们还提供了另外五种语言的新结果,即阿姆哈拉语、奥罗莫语、Tigrigna语、Wolaytta语和维吾尔语。在考虑的22种语言中,无论使用的训练语音大小如何,混合HMM-DNN模型都优于基于HMM-GMM的模型。总体而言,我们实现了7.14%至59.43%的相对改善。
{"title":"DNN-Based Speech Recognition for Globalphone Languages","authors":"Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, S. Abate, Tanja Schultz","doi":"10.1109/ICASSP40776.2020.9053144","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053144","url":null,"abstract":"This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. GP is a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in more than 20 languages. Moreover, we provide new results for five additional languages, namely, Amharic, Oromo, Tigrigna, Wolaytta, and Uyghur. Across the 22 languages considered, the hybrid HMM-DNN models outperform the HMM-GMM based models regardless of the size of the training speech used. Overall, we achieved relative improvements that range from 7.14% to 59.43%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"8269-8273"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86490635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-Scale Residual Network for Image Classification 图像分类的多尺度残差网络
X. Zhong, Oubo Gong, Wenxin Huang, Jingling Yuan, Bo Ma, R. W. Liu
Multi-scale approach representing image objects at various levels-of-details has been applied to various computer vision tasks. Existing image classification approaches place more emphasis on multi-scale convolution kernels, and overlook multi-scale feature maps. As such, some shallower information of the network will not be fully utilized. In this paper, we propose the Multi-Scale Residual (MSR) module that integrates multi-scale feature maps of the underlying information to the last layer of Convolutional Neural Network. Our proposed method significantly enhances the characteristics of the information in the final classification. Extensive experiments conducted on CIFAR100, Tiny-ImageNet and large-scale CalTech-256 datasets demonstrate the effectiveness of our method compared with Res-Family.
在不同细节层次上表示图像对象的多尺度方法已应用于各种计算机视觉任务。现有的图像分类方法更多地强调多尺度卷积核,而忽略了多尺度特征映射。这样,一些较浅的网络信息就不能被充分利用。在本文中,我们提出了多尺度残差(MSR)模块,它将底层信息的多尺度特征映射集成到卷积神经网络的最后一层。我们提出的方法显著增强了最终分类信息的特征。在CIFAR100、Tiny-ImageNet和大规模CalTech-256数据集上进行的大量实验表明,与Res-Family相比,我们的方法是有效的。
{"title":"Multi-Scale Residual Network for Image Classification","authors":"X. Zhong, Oubo Gong, Wenxin Huang, Jingling Yuan, Bo Ma, R. W. Liu","doi":"10.1109/ICASSP40776.2020.9053478","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053478","url":null,"abstract":"Multi-scale approach representing image objects at various levels-of-details has been applied to various computer vision tasks. Existing image classification approaches place more emphasis on multi-scale convolution kernels, and overlook multi-scale feature maps. As such, some shallower information of the network will not be fully utilized. In this paper, we propose the Multi-Scale Residual (MSR) module that integrates multi-scale feature maps of the underlying information to the last layer of Convolutional Neural Network. Our proposed method significantly enhances the characteristics of the information in the final classification. Extensive experiments conducted on CIFAR100, Tiny-ImageNet and large-scale CalTech-256 datasets demonstrate the effectiveness of our method compared with Res-Family.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"2023-2027"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86528753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Time-Frequency Network with Channel Attention and Non-Local Modules for Artificial Bandwidth Extension 一种具有信道关注和非局部模块的时频网络,用于人工带宽扩展
Yuanjie Dong, Yaxing Li, Xiaoqi Li, Shanjie Xu, Dan Wang, Zhihui Zhang, Shengwu Xiong
Convolution neural networks (CNNs) have been achieving increasing attention for the artificial bandwidth extension (ABE) task recently. However, these methods use the flipped low-frequency phase to reconstruct speech signals, which may lead to the well-known invalid short-time Fourier Transform (STFT) problem. The convolutional operations only enable networks to construct informative features by fusing both channel-wise and spatial information within local receptive fields at each layer. In this paper, we introduce a Time-Frequency Network (TFNet) with channel attention (CA) and non-local (NL) modules for ABE. The TFNet exploits the information from both time and frequency domain branches concurrently to avoid the invalid STFT problem. To capture the channels and space dependencies, we incorporate the CA and NL modules to construct a proposed fully convolutional neural network for the time and frequency branches of TFNet. Experimental results demonstrate that the proposed method outperforms the competing method.
卷积神经网络(cnn)在人工带宽扩展(ABE)任务中的应用近年来受到越来越多的关注。然而,这些方法使用反转的低频相位来重建语音信号,这可能导致众所周知的无效短时傅里叶变换(STFT)问题。卷积运算只允许网络通过在每层的局部接受域内融合通道和空间信息来构建信息特征。本文介绍了一种具有信道注意(CA)和非局部(NL)模块的时频网络(TFNet)。TFNet同时利用时域和频域分支的信息,避免了无效的STFT问题。为了捕获通道和空间依赖性,我们将CA和NL模块结合起来,为TFNet的时间和频率分支构建了一个拟议的全卷积神经网络。实验结果表明,该方法优于同类方法。
{"title":"A Time-Frequency Network with Channel Attention and Non-Local Modules for Artificial Bandwidth Extension","authors":"Yuanjie Dong, Yaxing Li, Xiaoqi Li, Shanjie Xu, Dan Wang, Zhihui Zhang, Shengwu Xiong","doi":"10.1109/ICASSP40776.2020.9053769","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053769","url":null,"abstract":"Convolution neural networks (CNNs) have been achieving increasing attention for the artificial bandwidth extension (ABE) task recently. However, these methods use the flipped low-frequency phase to reconstruct speech signals, which may lead to the well-known invalid short-time Fourier Transform (STFT) problem. The convolutional operations only enable networks to construct informative features by fusing both channel-wise and spatial information within local receptive fields at each layer. In this paper, we introduce a Time-Frequency Network (TFNet) with channel attention (CA) and non-local (NL) modules for ABE. The TFNet exploits the information from both time and frequency domain branches concurrently to avoid the invalid STFT problem. To capture the channels and space dependencies, we incorporate the CA and NL modules to construct a proposed fully convolutional neural network for the time and frequency branches of TFNet. Experimental results demonstrate that the proposed method outperforms the competing method.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"6954-6958"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83671092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Motion Feedback Design for Video Frame Interpolation 视频帧插值的运动反馈设计
Mengshun Hu, Liang Liao, Jing Xiao, Lin Gu, S. Satoh
This paper introduces a feedback-based approach to interpolate video frames involving small and fast-moving objects. Unlike the existing feedforward-based methods that estimate optical flow and synthesize in-between frames sequentially, we introduce a motion-oriented component that adds a feedback block to the existing multi-scale autoencoder pipeline, which feedbacks information of small objects shared between architectures of two different scales. We show that feeding this additional information enables more robust detection of optical flow caused by small objects in fast motion. Using experiments on various datasets, we show that the feedback mechanism allows our method to achieve state-of-the-art results, both qualitatively and quantitatively.
本文介绍了一种基于反馈的视频帧插值方法。与现有的基于前馈的光流估计和帧间顺序合成的方法不同,我们引入了一个面向运动的组件,该组件在现有的多尺度自编码器管道中添加了一个反馈块,该组件可以反馈两个不同尺度架构之间共享的小对象信息。我们表明,提供这些附加信息可以更可靠地检测由快速运动的小物体引起的光流。通过对各种数据集的实验,我们表明反馈机制允许我们的方法在定性和定量上获得最先进的结果。
{"title":"Motion Feedback Design for Video Frame Interpolation","authors":"Mengshun Hu, Liang Liao, Jing Xiao, Lin Gu, S. Satoh","doi":"10.1109/ICASSP40776.2020.9053223","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053223","url":null,"abstract":"This paper introduces a feedback-based approach to interpolate video frames involving small and fast-moving objects. Unlike the existing feedforward-based methods that estimate optical flow and synthesize in-between frames sequentially, we introduce a motion-oriented component that adds a feedback block to the existing multi-scale autoencoder pipeline, which feedbacks information of small objects shared between architectures of two different scales. We show that feeding this additional information enables more robust detection of optical flow caused by small objects in fast motion. Using experiments on various datasets, we show that the feedback mechanism allows our method to achieve state-of-the-art results, both qualitatively and quantitatively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2016 1","pages":"4347-4351"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82628509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dynamic Channel Pruning For Correlation Filter Based Object Tracking 基于相关滤波的目标跟踪动态通道剪枝
Goutam Yelluru Gopal, Maria A. Amer
Fusion of multi-channel representations has played a crucial role in the success of correlation filter (CF) based trackers. But, all channels do not contain useful information for target localization at every frame. During challenging scenarios, ambiguous responses of non-discriminative or unreliable channels lead to erroneous results and cause tracker drift. To mitigate this problem, we propose a method for dynamic channel pruning through online (i.e., at every frame) learning of channel weights. Our method uses estimated reliability scores to compute channel weights, to nullify the impact of highly unreliable channels. The proposed method for learning of channel weights is modeled as a non-smooth convex optimization problem. We then propose an algorithm to solve the resulting problem efficiently compared to off-the-shelf solvers. Results on VOT2018 and TC128 datasets show that proposed method improves the performance of baseline CF trackers.
多通道表示的融合对基于相关滤波器的跟踪器的成功起着至关重要的作用。但是,并非所有的信道都包含每一帧目标定位的有用信息。在具有挑战性的情况下,非鉴别或不可靠信道的模糊响应会导致错误的结果并导致跟踪器漂移。为了缓解这个问题,我们提出了一种通过在线(即在每帧)学习信道权重来进行动态信道修剪的方法。我们的方法使用估计的可靠性分数来计算信道权重,以消除高度不可靠信道的影响。该方法将信道权值的学习建模为非光滑凸优化问题。然后,我们提出了一种算法,与现有的求解器相比,可以有效地解决所产生的问题。在VOT2018和TC128数据集上的结果表明,该方法提高了基线CF跟踪器的性能。
{"title":"Dynamic Channel Pruning For Correlation Filter Based Object Tracking","authors":"Goutam Yelluru Gopal, Maria A. Amer","doi":"10.1109/ICASSP40776.2020.9053333","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053333","url":null,"abstract":"Fusion of multi-channel representations has played a crucial role in the success of correlation filter (CF) based trackers. But, all channels do not contain useful information for target localization at every frame. During challenging scenarios, ambiguous responses of non-discriminative or unreliable channels lead to erroneous results and cause tracker drift. To mitigate this problem, we propose a method for dynamic channel pruning through online (i.e., at every frame) learning of channel weights. Our method uses estimated reliability scores to compute channel weights, to nullify the impact of highly unreliable channels. The proposed method for learning of channel weights is modeled as a non-smooth convex optimization problem. We then propose an algorithm to solve the resulting problem efficiently compared to off-the-shelf solvers. Results on VOT2018 and TC128 datasets show that proposed method improves the performance of baseline CF trackers.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"172 1","pages":"5700-5704"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82937761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deriving Compact Feature Representations Via Annealed Contraction 通过退火收缩导出紧凑特征表示
Muhammad A Shah, B. Raj
It is common practice to use pretrained image recognition models to compute feature representations for visual data. The size of the feature representations can have a noticeable impact on the complexity of the models that use these representations, and by extension on their deployablity and scalability. Therefore it would be beneficial to have compact visual representations that carry as much information as their high-dimensional counterparts. To this end we propose a technique that shrinks a layer by an iterative process in which neurons are removed from the and network is fine tuned. Using this technique we are able to remove 99% of the neurons from the penultimate layer of AlexNet and VGG16, while suffering less than 5% drop in accuracy on CIFAR10, Caltech101 and Caltech256. We also show that our method can reduce the size of AlexNet by 95% while only suffering a 4% reduction in accuracy on Caltech101.
通常的做法是使用预训练的图像识别模型来计算视觉数据的特征表示。特征表示的大小可以对使用这些表示的模型的复杂性产生显著影响,并通过扩展对其可部署性和可伸缩性产生影响。因此,拥有紧凑的视觉表示是有益的,它可以携带与高维对应的信息一样多的信息。为此,我们提出了一种通过迭代过程收缩层的技术,其中从网络中删除神经元并对网络进行微调。使用这种技术,我们能够从AlexNet和VGG16的倒数第二层去除99%的神经元,而在CIFAR10, Caltech101和Caltech256上的准确率下降不到5%。我们还表明,我们的方法可以将AlexNet的大小减少95%,而在Caltech101上的准确率仅降低4%。
{"title":"Deriving Compact Feature Representations Via Annealed Contraction","authors":"Muhammad A Shah, B. Raj","doi":"10.1109/ICASSP40776.2020.9054527","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054527","url":null,"abstract":"It is common practice to use pretrained image recognition models to compute feature representations for visual data. The size of the feature representations can have a noticeable impact on the complexity of the models that use these representations, and by extension on their deployablity and scalability. Therefore it would be beneficial to have compact visual representations that carry as much information as their high-dimensional counterparts. To this end we propose a technique that shrinks a layer by an iterative process in which neurons are removed from the and network is fine tuned. Using this technique we are able to remove 99% of the neurons from the penultimate layer of AlexNet and VGG16, while suffering less than 5% drop in accuracy on CIFAR10, Caltech101 and Caltech256. We also show that our method can reduce the size of AlexNet by 95% while only suffering a 4% reduction in accuracy on Caltech101.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 1","pages":"2068-2072"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88956095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1