首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
A Note on Totally Symmetric Equi-Isoclinic Tight Fusion Frames 关于完全对称等斜紧融合框架的一个注记
M. Fickus, Joseph W. Iverson, J. Jasper, D. Mixon
Consider the fundamental problem of arranging r-dimensional subspaces of Rd in such a way that maximizes the minimum distance between unit vectors in different subspaces. It is well known that equi-isoclinic tight fusion frames (EITFFs) are optimal for this packing problem, but such ensembles are notoriously hard to construct. In this paper, we present a novel construction of EITFFs that are totally symmetric: any permutation of the subspaces can be realized by an orthogonal transformation of ℝd.
考虑这样一个基本问题:以使不同子空间中单位向量之间的最小距离最大化的方式排列Rd的r维子空间。众所周知,等斜紧密融合框架(EITFFs)是最理想的填充问题,但这样的集成是出了名的难以构建。本文给出了完全对称eitff的一种新构造:任何子空间的置换都可以通过一个正交变换来实现。
{"title":"A Note on Totally Symmetric Equi-Isoclinic Tight Fusion Frames","authors":"M. Fickus, Joseph W. Iverson, J. Jasper, D. Mixon","doi":"10.1109/icassp43922.2022.9746835","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746835","url":null,"abstract":"Consider the fundamental problem of arranging r-dimensional subspaces of Rd in such a way that maximizes the minimum distance between unit vectors in different subspaces. It is well known that equi-isoclinic tight fusion frames (EITFFs) are optimal for this packing problem, but such ensembles are notoriously hard to construct. In this paper, we present a novel construction of EITFFs that are totally symmetric: any permutation of the subspaces can be realized by an orthogonal transformation of ℝd.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognition Of Silently Spoken Word From Eeg Signals Using Dense Attention Network (DAN) 基于密集注意网络(DAN)的脑电信号无声语音识别
Sahil Datta, A. Aondoakaa, Jorunn Jo Holmberg, E. Antonova
In this paper, we propose a method for recognizing silently spoken words from electroencephalogram (EEG) signals using a Dense Attention Network (DAN). The proposed network learns features from the EEG data by applying the self-attention mechanism on temporal, spectral, and spatial (electrodes) dimensions. We examined the effectiveness of the proposed network in extracting spatio-spectro-temporal in-formation from EEG signals and provide a network for recognition of silently spoken words. The DAN achieved a recognition rate of 80.7% in leave-trials-out (LTO) and 75.1% in leave-subject-out (LSO) cross validation methods. In a direct comparison with other methods, the DAN outperformed other existing techniques in recognition of silently spoken words.
在本文中,我们提出了一种使用密集注意网络(DAN)从脑电图(EEG)信号中识别无声口语的方法。该网络通过在时间、频谱和空间(电极)维度上应用自注意机制从EEG数据中学习特征。我们检验了该网络在提取脑电图信号的时空信息方面的有效性,并提供了一个用于无声口语识别的网络。在离开试验(LTO)和离开受试者(LSO)交叉验证方法中,DAN的识别率分别为80.7%和75.1%。在与其他方法的直接比较中,DAN在识别无声口语方面优于其他现有技术。
{"title":"Recognition Of Silently Spoken Word From Eeg Signals Using Dense Attention Network (DAN)","authors":"Sahil Datta, A. Aondoakaa, Jorunn Jo Holmberg, E. Antonova","doi":"10.1109/icassp43922.2022.9746241","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746241","url":null,"abstract":"In this paper, we propose a method for recognizing silently spoken words from electroencephalogram (EEG) signals using a Dense Attention Network (DAN). The proposed network learns features from the EEG data by applying the self-attention mechanism on temporal, spectral, and spatial (electrodes) dimensions. We examined the effectiveness of the proposed network in extracting spatio-spectro-temporal in-formation from EEG signals and provide a network for recognition of silently spoken words. The DAN achieved a recognition rate of 80.7% in leave-trials-out (LTO) and 75.1% in leave-subject-out (LSO) cross validation methods. In a direct comparison with other methods, the DAN outperformed other existing techniques in recognition of silently spoken words.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127918229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Frame Interpolation via Local Lightweight Bidirectional Encoding with Channel Attention Cascade 基于信道注意级联的局部轻量级双向编码视频帧插值
Xiangling Ding, Pu Huang, Dengyong Zhang, Xianfeng Zhao
Deep Neural Networks based video frame interpolation, synthesizing in-between frames given two consecutive neighboring frames, typically depends on heavy model architectures, preventing them from being deployed on small terminals. When directly adopting the lightweight network architecture from these models, the synthesized frames may suffer from poor visual appearance. In this paper, a lightweight-driven video frame interpolation network (L2BEC2) is proposed. Concretely, we first improve the visual appearance by introducing the bidirectional encoding structure with channel attention cascade to better characterize the motion information; then we further adopt the local network lightweight idea into the aforementioned structure to significantly eliminate its redundant parts of the model parameters. As a result, our L2BEC2 performs favorably at the cost of only one third of the parameters compared with the state-of-the-art methods on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.
基于深度神经网络的视频帧插值,在给定两个连续相邻帧的情况下合成中间帧,通常依赖于重型模型架构,这阻碍了它们在小型终端上的部署。当直接采用这些模型的轻量级网络架构时,合成帧的视觉效果可能会很差。本文提出了一种轻量驱动的视频帧插值网络(L2BEC2)。具体而言,我们首先通过引入通道注意级联的双向编码结构来改善视觉外观,更好地表征运动信息;然后,我们在上述结构中进一步采用局部网络轻量化思想,显著消除了其模型参数中的冗余部分。因此,与公共数据集上最先进的方法相比,我们的L2BEC2仅以三分之一的参数为代价表现良好。我们的源代码可从https://github.com/Pumpkin123709/LBEC.git获得。
{"title":"Video Frame Interpolation via Local Lightweight Bidirectional Encoding with Channel Attention Cascade","authors":"Xiangling Ding, Pu Huang, Dengyong Zhang, Xianfeng Zhao","doi":"10.1109/icassp43922.2022.9747182","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747182","url":null,"abstract":"Deep Neural Networks based video frame interpolation, synthesizing in-between frames given two consecutive neighboring frames, typically depends on heavy model architectures, preventing them from being deployed on small terminals. When directly adopting the lightweight network architecture from these models, the synthesized frames may suffer from poor visual appearance. In this paper, a lightweight-driven video frame interpolation network (L2BEC2) is proposed. Concretely, we first improve the visual appearance by introducing the bidirectional encoding structure with channel attention cascade to better characterize the motion information; then we further adopt the local network lightweight idea into the aforementioned structure to significantly eliminate its redundant parts of the model parameters. As a result, our L2BEC2 performs favorably at the cost of only one third of the parameters compared with the state-of-the-art methods on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121439858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SDETR: Attention-Guided Salient Object Detection with Transformer SDETR:注意引导显著目标检测与变压器
Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo
Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.
现有的大多数基于cnn的显著目标检测方法可以识别毛发、动物皮毛等细粒度的分割细节,但由于局域卷积层导致缺乏全局上下文信息,往往会对显著目标进行错误预测。当前SOD任务的训练数据有限,增加了显著性信息获取的难度。在本文中,我们提出了一个两阶段的预测-细化SDETR模型,以利用变压器层和CNN层的优点,可以产生具有准确的显著性预测和细粒度局部细节的结果。我们还提出了一种新的预训练数据标注COCO SOD,以消除由于训练数据不足而导致的过拟合问题。在五个基准数据集上的综合实验表明,SDETR在四个评估指标上优于最先进的方法,并且我们的COCO SOD可以大大提高DUTS, ECSSD, DUT, PASCAL-S数据集的模型性能。
{"title":"SDETR: Attention-Guided Salient Object Detection with Transformer","authors":"Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo","doi":"10.1109/icassp43922.2022.9746367","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746367","url":null,"abstract":"Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115916725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sensors to Sign Language: A Natural Approach to Equitable Communication 手语感应:公平交流的自然途径
T. Fouts, Ali Hindy, C. Tanner
Sign Language Recognition (SLR) aims to improve the equity of communication with the hearing impaired. However, SLR typically relies on having recorded videos of the signer. We develop a more natural solution by fitting a signer with arm sensors and classifying the sensor signals directly into language. We refer to this task as Sensors-to-Sign-Language (STSL). While existing STSL systems demonstrate effectiveness with small vocabularies of fewer than 100 words, we aim to determine if STSL can scale to larger, more realistic lexicons. For this purpose, we introduce a new dataset, SignBank, which consists of exactly 6,000 signs, spans 558 distinct words from 15 different novice signers, and constitutes the largest such dataset. By using a simple but effective model for STSL, we demonstrate a strong baseline performance on SignBank. Notably, despite our model having trained on only four signings of each word, it is able to correctly classify new signings with 95.1% accuracy (out of 558 candidate words). This work enables and motivates further development of lightweight, wearable hardware and real-time modelling for SLR.
手语识别(SLR)旨在提高与听障人士交流的公平性。然而,单反相机通常依赖于录制签名者的视频。我们开发了一个更自然的解决方案,通过安装一个带有手臂传感器的手语,并将传感器信号直接分类为语言。我们将此任务称为传感器到手语(STSL)。虽然现有的STSL系统在小于100个单词的小词汇表上表现出有效性,但我们的目标是确定STSL是否可以扩展到更大、更现实的词汇表。为此,我们引入了一个新的数据集,SignBank,它由6000个符号组成,跨越了来自15个不同新手的558个不同的单词,构成了最大的此类数据集。通过使用简单而有效的STSL模型,我们在SignBank上展示了强大的基线性能。值得注意的是,尽管我们的模型只训练了每个单词的四种标记,但它能够以95.1%的准确率(在558个候选单词中)正确分类新标记。这项工作推动了单反的轻量化、可穿戴硬件和实时建模的进一步发展。
{"title":"Sensors to Sign Language: A Natural Approach to Equitable Communication","authors":"T. Fouts, Ali Hindy, C. Tanner","doi":"10.1109/icassp43922.2022.9747385","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747385","url":null,"abstract":"Sign Language Recognition (SLR) aims to improve the equity of communication with the hearing impaired. However, SLR typically relies on having recorded videos of the signer. We develop a more natural solution by fitting a signer with arm sensors and classifying the sensor signals directly into language. We refer to this task as Sensors-to-Sign-Language (STSL). While existing STSL systems demonstrate effectiveness with small vocabularies of fewer than 100 words, we aim to determine if STSL can scale to larger, more realistic lexicons. For this purpose, we introduce a new dataset, SignBank, which consists of exactly 6,000 signs, spans 558 distinct words from 15 different novice signers, and constitutes the largest such dataset. By using a simple but effective model for STSL, we demonstrate a strong baseline performance on SignBank. Notably, despite our model having trained on only four signings of each word, it is able to correctly classify new signings with 95.1% accuracy (out of 558 candidate words). This work enables and motivates further development of lightweight, wearable hardware and real-time modelling for SLR.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113961361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ORCA-PARTY: An Automatic Killer Whale Sound Type Separation Toolkit Using Deep Learning ORCA-PARTY:使用深度学习的自动虎鲸声音类型分离工具包
Christian Bergler, M. Schmitt, A. Maier, R. Cheng, Volker Barth, E. Nöth
Data-driven and machine-based analysis of massive bioacoustic data collections, in particular acoustic regions containing a substantial number of vocalizations events, is essential and extremely valuable to identify recurring vocal paradigms. However, these acoustic sections are usually characterized by a strong incidence of overlapping vocalization events, a major problem severely affecting subsequent human-/machine-based analysis and interpretation. Robust machine-driven signal separation of species-specific call types is extremely challenging due to missing ground truth data, speaker/source-relevant information, limited knowledge about inter- and intra-call type variations, next to diverse recording conditions. The current study is the first introducing a fully-automated deep signal separation approach for overlapping orca vocalizations, addressing all of the previously mentioned challenges, together with one of the largest bioacoustic data archives recorded on killer whales (Orcinus Orca). Incorporating ORCA-PARTY as additional data enhancement step for downstream call type classification demonstrated to be extremely valuable. Besides the proof of cross-domain applicability and consistently promising results on non-overlapping signals, significant improvements were achieved when processing acoustic orca segments comprising a multitude of vocal activities. Apart from auspicious visual inspections, a final numerical evaluation on an unseen dataset proved that about 30 % more known sound patterns could be identified.
对大量生物声学数据收集,特别是包含大量发声事件的声学区域进行数据驱动和基于机器的分析,对于识别重复出现的声乐范式至关重要且极具价值。然而,这些声学部分通常具有很强的重叠发声事件发生率,这是严重影响后续基于人/机器的分析和解释的主要问题。由于缺少地面真实数据、说话者/源相关信息、对呼叫类型之间和呼叫类型变化的有限知识以及不同的记录条件,对特定物种呼叫类型的鲁棒机器驱动信号分离极具挑战性。目前的研究首次引入了一种全自动深度信号分离方法,用于重叠逆戟鲸的发声,解决了前面提到的所有挑战,以及记录在逆戟鲸(Orcinus orca)上的最大生物声学数据档案之一。将ORCA-PARTY作为下游呼叫类型分类的额外数据增强步骤被证明是非常有价值的。除了证明跨域适用性和在非重叠信号上的一致有希望的结果外,在处理包含大量声乐活动的声学逆戟鲸片段时取得了显着改进。除了吉祥的视觉检查,对一个未知数据集的最终数值评估证明,大约30%的已知声音模式可以被识别出来。
{"title":"ORCA-PARTY: An Automatic Killer Whale Sound Type Separation Toolkit Using Deep Learning","authors":"Christian Bergler, M. Schmitt, A. Maier, R. Cheng, Volker Barth, E. Nöth","doi":"10.1109/icassp43922.2022.9746623","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746623","url":null,"abstract":"Data-driven and machine-based analysis of massive bioacoustic data collections, in particular acoustic regions containing a substantial number of vocalizations events, is essential and extremely valuable to identify recurring vocal paradigms. However, these acoustic sections are usually characterized by a strong incidence of overlapping vocalization events, a major problem severely affecting subsequent human-/machine-based analysis and interpretation. Robust machine-driven signal separation of species-specific call types is extremely challenging due to missing ground truth data, speaker/source-relevant information, limited knowledge about inter- and intra-call type variations, next to diverse recording conditions. The current study is the first introducing a fully-automated deep signal separation approach for overlapping orca vocalizations, addressing all of the previously mentioned challenges, together with one of the largest bioacoustic data archives recorded on killer whales (Orcinus Orca). Incorporating ORCA-PARTY as additional data enhancement step for downstream call type classification demonstrated to be extremely valuable. Besides the proof of cross-domain applicability and consistently promising results on non-overlapping signals, significant improvements were achieved when processing acoustic orca segments comprising a multitude of vocal activities. Apart from auspicious visual inspections, a final numerical evaluation on an unseen dataset proved that about 30 % more known sound patterns could be identified.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131392259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Disentangled Feature-Guided Multi-Exposure High Dynamic Range Imaging 解纠缠特征引导多曝光高动态范围成像
Keun-Ohk Lee, Y. Jang, N. Cho
Multi-exposure high dynamic range (HDR) imaging aims to generate an HDR image from multiple differently exposed low dynamic range (LDR) images. It is a challenging task due to two major problems: (1) there are usually misalignments among the input LDR images, and (2) LDR images often have incomplete information due to under-/over-exposure. In this paper, we propose a disentangled feature-guided HDR network (DFGNet) to alleviate the above-stated problems. Specifically, we first extract and disentangle exposure features and spatial features of input LDR images. Then, we process these features through the proposed DFG modules, which produce a high-quality HDR image. Experiments show that the proposed DFGNet achieves outstanding performance on a benchmark dataset. Our code and more results are available at https://github.com/KeuntekLee/DFGNet.
多曝光高动态范围(HDR)成像的目的是将多幅不同曝光的低动态范围(LDR)图像生成一幅HDR图像。由于两个主要问题,这是一项具有挑战性的任务:(1)输入的LDR图像之间通常存在不对准;(2)由于曝光不足/过度,LDR图像通常具有不完整的信息。在本文中,我们提出了一种解纠缠特征引导的HDR网络(DFGNet)来缓解上述问题。具体而言,我们首先提取并解卷积输入LDR图像的曝光特征和空间特征。然后,我们通过提出的DFG模块对这些特征进行处理,生成高质量的HDR图像。实验表明,所提出的DFGNet在一个基准数据集上取得了优异的性能。我们的代码和更多结果可在https://github.com/KeuntekLee/DFGNet上获得。
{"title":"Disentangled Feature-Guided Multi-Exposure High Dynamic Range Imaging","authors":"Keun-Ohk Lee, Y. Jang, N. Cho","doi":"10.1109/icassp43922.2022.9747329","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747329","url":null,"abstract":"Multi-exposure high dynamic range (HDR) imaging aims to generate an HDR image from multiple differently exposed low dynamic range (LDR) images. It is a challenging task due to two major problems: (1) there are usually misalignments among the input LDR images, and (2) LDR images often have incomplete information due to under-/over-exposure. In this paper, we propose a disentangled feature-guided HDR network (DFGNet) to alleviate the above-stated problems. Specifically, we first extract and disentangle exposure features and spatial features of input LDR images. Then, we process these features through the proposed DFG modules, which produce a high-quality HDR image. Experiments show that the proposed DFGNet achieves outstanding performance on a benchmark dataset. Our code and more results are available at https://github.com/KeuntekLee/DFGNet.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132216010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Harmonic and Percussive Sound Separation Based on Mixed Partial Derivative of Phase Spectrogram 基于相位谱混合偏导数的谐波与冲击声分离
Natsuki Akaishi, K. Yatabe, Yasuhiro Oikawa
Harmonic and percussive sound separation (HPSS) is a widely applied pre-processing tool that extracts distinct (harmonic and percussive) components of a signal. In the previous methods, HPSS has been performed based on the structural properties of magnitude (or power) spectrograms. However, such approach does not take advantage of phase that contains useful information of the waveform. In this paper, we propose a novel HPSS method named MipDroP that relies only on phase and does not use information of magnitude spectrograms. The proposed MipDroP algorithm effectively examines phase through its mixed partial derivative and constructs a pair of masks for the separation. Our experiments showed that MipDroP can extract percussive components better than the other methods.
谐波与打击声分离(HPSS)是一种广泛应用的预处理工具,它可以提取信号中不同的(谐波和打击声)分量。在以前的方法中,HPSS是基于幅度(或功率)谱图的结构特性进行的。然而,这种方法没有利用包含有用波形信息的相位。在本文中,我们提出了一种新的仅依赖相位而不使用幅度谱图信息的HPSS方法,称为MipDroP。提出的MipDroP算法通过其混合偏导数有效地检测相位,并构造一对掩模进行分离。实验结果表明,MipDroP提取冲击成分的效果优于其他方法。
{"title":"Harmonic and Percussive Sound Separation Based on Mixed Partial Derivative of Phase Spectrogram","authors":"Natsuki Akaishi, K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/icassp43922.2022.9747057","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747057","url":null,"abstract":"Harmonic and percussive sound separation (HPSS) is a widely applied pre-processing tool that extracts distinct (harmonic and percussive) components of a signal. In the previous methods, HPSS has been performed based on the structural properties of magnitude (or power) spectrograms. However, such approach does not take advantage of phase that contains useful information of the waveform. In this paper, we propose a novel HPSS method named MipDroP that relies only on phase and does not use information of magnitude spectrograms. The proposed MipDroP algorithm effectively examines phase through its mixed partial derivative and constructs a pair of masks for the separation. Our experiments showed that MipDroP can extract percussive components better than the other methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130080514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Regression Assisted Matrix Completion for Reconstructing a Propagation Field with Application to Source Localization 基于回归辅助矩阵补全的传播场重构及其在源定位中的应用
Hao Sun, Junting Chen
This paper develops a regression assisted matrix completion method to reconstruct the propagation field for received signal strength (RSS) based source localization without prior knowledge of the propagation model. Existing matrix completion methods did not exploit the fact that the uncertainty of each observed entry is different due to the reality that the sensor density may vary across different locations. This paper proposes to employ local polynomial regression to increase the accuracy of matrix completion. First, the values of selected entries of a matrix are estimated via interpolation from local measurements, and the interpolation error is analyzed. Then, a matrix completion problem that is aware of the different uncertainty of observed entries is formulated and solved. It is demonstrated that the proposed method significantly improves the performance of matrix completion, and as a result, increases the localization accuracy from the numerical results.
本文提出了一种回归辅助矩阵补全方法,在不需要事先知道传播模型的情况下,对基于接收信号强度(RSS)的源定位进行传播场重构。现有的矩阵补全方法没有考虑到由于传感器密度可能在不同位置变化而导致每个观测条目的不确定性不同的事实。本文提出采用局部多项式回归来提高矩阵补全的精度。首先,通过局部测量值插值估计矩阵中所选条目的值,并分析插值误差;然后,提出并求解了一个考虑不同观测项不确定性的矩阵补全问题。数值结果表明,该方法显著提高了矩阵补全的性能,从而提高了定位精度。
{"title":"Regression Assisted Matrix Completion for Reconstructing a Propagation Field with Application to Source Localization","authors":"Hao Sun, Junting Chen","doi":"10.1109/icassp43922.2022.9746415","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746415","url":null,"abstract":"This paper develops a regression assisted matrix completion method to reconstruct the propagation field for received signal strength (RSS) based source localization without prior knowledge of the propagation model. Existing matrix completion methods did not exploit the fact that the uncertainty of each observed entry is different due to the reality that the sensor density may vary across different locations. This paper proposes to employ local polynomial regression to increase the accuracy of matrix completion. First, the values of selected entries of a matrix are estimated via interpolation from local measurements, and the interpolation error is analyzed. Then, a matrix completion problem that is aware of the different uncertainty of observed entries is formulated and solved. It is demonstrated that the proposed method significantly improves the performance of matrix completion, and as a result, increases the localization accuracy from the numerical results.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133831969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How Secure Are The Adversarial Examples Themselves? 对抗性例子本身有多安全?
Hui Zeng, Kang Deng, Biwei Chen, Anjie Peng
Existing adversarial example generation algorithms mainly consider the success rate of spoofing target model, but pay little attention to its own security. In this paper, we propose the concept of adversarial example security as how unlikely themselves can be detected. A two-step test is proposed to deal with the adversarial attacks of different strengths. Game theory is introduced to model the interplay between the attacker and the investigator. By solving Nash equilibrium, the optimal strategies of both parties are obtained, and the security of the attacks is evaluated. Five typical attacks are compared on the ImageNet. The results show that a rational attacker tends to use a relatively weak strength. By comparing the ROC curves under Nash equilibrium, it is observed that the constrained perturbation attacks are more secure than the optimized perturbation attacks in face of the two-step test. The proposed framework can be used to evaluate the security of various potential attacks and further the research of adversarial example generation/detection.
现有的对抗样例生成算法主要考虑欺骗目标模型的成功率,而很少考虑其自身的安全性。在本文中,我们提出了对抗性示例安全性的概念,即它们自身不太可能被检测到的程度。提出了一种两步测试方法来处理不同强度的对抗性攻击。博弈论被引入来模拟攻击者和调查者之间的相互作用。通过求解纳什均衡,得到了双方的最优策略,并对攻击的安全性进行了评价。在ImageNet上比较了五种典型的攻击。结果表明,理性的攻击者倾向于使用相对较弱的强度。通过比较Nash均衡下的ROC曲线,可以发现在两步检验中,约束摄动攻击比优化摄动攻击更安全。该框架可用于评估各种潜在攻击的安全性,并进一步研究对抗示例生成/检测。
{"title":"How Secure Are The Adversarial Examples Themselves?","authors":"Hui Zeng, Kang Deng, Biwei Chen, Anjie Peng","doi":"10.1109/ICASSP43922.2022.9747206","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747206","url":null,"abstract":"Existing adversarial example generation algorithms mainly consider the success rate of spoofing target model, but pay little attention to its own security. In this paper, we propose the concept of adversarial example security as how unlikely themselves can be detected. A two-step test is proposed to deal with the adversarial attacks of different strengths. Game theory is introduced to model the interplay between the attacker and the investigator. By solving Nash equilibrium, the optimal strategies of both parties are obtained, and the security of the attacks is evaluated. Five typical attacks are compared on the ImageNet. The results show that a rational attacker tends to use a relatively weak strength. By comparing the ROC curves under Nash equilibrium, it is observed that the constrained perturbation attacks are more secure than the optimized perturbation attacks in face of the two-step test. The proposed framework can be used to evaluate the security of various potential attacks and further the research of adversarial example generation/detection.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134428895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1