首页 > 最新文献

2022 IEEE International Conference on Signal Processing and Communications (SPCOM)最新文献

英文 中文
A Hierarchical Approach for Decoding Human Reach-and-Grasp Activities based on EEG Signals 一种基于脑电图信号的人类伸手抓握活动分层解码方法
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840794
Bhagyasree Kanuparthi, A. Turlapaty
Physically disabled patients such as the paralyzed, amputees and stroke patients find it difficult to perform daily activities on their own. A Brain-Computer Interface (BCI) using Electroencephalography (EEG) signals is an option for the rehabilitation of these patients. The BCI function can be enhanced by decoding the movements from a limb through an intuitive control of the prosthetic arm. However, decoding them with the traditional classifiers is a challenging task. In this paper, a two-stage hierarchical framework is proposed for the decoding of reach-and-grasp actions. In stage-l, the action signals are separated from rest segments based on power spectral density features and a fine k-nearest neighbor classifier (FKNN). In stage-2, the signals identified as action are further classified into palmar and lateral type reach-and-grasp actions using the mean absolute value features with the FKNN classifier. In comparison with the existing classifiers, the proposed method has a superior performance of 85.38% test accuracy.
肢体残疾患者,如瘫痪、截肢和中风患者,很难独立完成日常活动。使用脑电图(EEG)信号的脑机接口(BCI)是这些患者康复的一种选择。BCI功能可以通过对假肢手臂的直观控制来解码肢体的运动来增强。然而,用传统的分类器对它们进行解码是一项具有挑战性的任务。本文提出了一种两阶段层次结构的抓取动作解码框架。在阶段1中,基于功率谱密度特征和精细k近邻分类器(FKNN)将动作信号从休息段中分离出来。在阶段2中,使用FKNN分类器的均值绝对值特征,将识别为动作的信号进一步分类为掌型和侧型伸手抓握动作。与现有分类器相比,该方法具有85.38%的测试准确率。
{"title":"A Hierarchical Approach for Decoding Human Reach-and-Grasp Activities based on EEG Signals","authors":"Bhagyasree Kanuparthi, A. Turlapaty","doi":"10.1109/SPCOM55316.2022.9840794","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840794","url":null,"abstract":"Physically disabled patients such as the paralyzed, amputees and stroke patients find it difficult to perform daily activities on their own. A Brain-Computer Interface (BCI) using Electroencephalography (EEG) signals is an option for the rehabilitation of these patients. The BCI function can be enhanced by decoding the movements from a limb through an intuitive control of the prosthetic arm. However, decoding them with the traditional classifiers is a challenging task. In this paper, a two-stage hierarchical framework is proposed for the decoding of reach-and-grasp actions. In stage-l, the action signals are separated from rest segments based on power spectral density features and a fine k-nearest neighbor classifier (FKNN). In stage-2, the signals identified as action are further classified into palmar and lateral type reach-and-grasp actions using the mean absolute value features with the FKNN classifier. In comparison with the existing classifiers, the proposed method has a superior performance of 85.38% test accuracy.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glottal instants extraction from speech signal using Deep Feature Loss 基于深度特征损失的语音信号声门瞬时信号提取
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840808
Supritha M. Shetty, Suraj Durgesht, K. Deepak
Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.
声门电描记仪(EGG)是一种测量声带传导的仪器。EGG信号的分析在文献中有许多应用,如语音到文本的合成、语音紊乱分析、情绪识别、说话人验证等。因此,EGG设备对于记录声带活动是必不可少的。此外,本文还提出了一种利用上下文聚合卷积神经网络从语音信号合成EGG波形的新方法。通过与另一个称为EGG分类网络的网络进行比较,计算深度特征损失来训练合成网络。合成的EGG信号需要进行表征。在发声过程中,声带完全闭合的瞬间称为声门闭合瞬间(glottal closure moment)。同样,打开的瞬间被称为声门打开的瞬间(GOIs)。使用EGG信号可以可靠地测量这些瞬间。将该方法的性能与其他最新技术进行了比较。CMU-Arctic数据库具有并行的语音语料库和EGG信号同时记录。该数据库用于训练合成网络和进行比较。结果表明,从合成的声门信号中提取声门瞬时信号的性能与其他方法相当。
{"title":"Glottal instants extraction from speech signal using Deep Feature Loss","authors":"Supritha M. Shetty, Suraj Durgesht, K. Deepak","doi":"10.1109/SPCOM55316.2022.9840808","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840808","url":null,"abstract":"Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125330721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Increasing Transferability by Imposing Linearity and Perturbation in Intermediate Layer with Diverse Input Patterns 通过在不同输入模式的中间层施加线性和扰动来增加可转移性
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840512
Meet Shah, Srimanta Mandal, Shruti Bhilare, Avik Hati Dhirubhai
Despite high prediction accuracy, deep networks are vulnerable to adversarial attacks, designed by inducing human-indiscernible perturbations to clean images. Hence, adversarial samples can mislead already trained deep networks. The process of generating adversarial examples can assist us in investigating the robustness of different models. Many developed adversarial attacks often fail under challenging black-box settings. Hence, it is required to improve transferability of adversarial attacks to an unknown model. In this aspect, we propose to increase the rate of transferability by inducing linearity in a few intermediate layers of architecture. The proposed design does not disturb the original architecture much. The design focuses on significance of intermediate layers in generating feature maps suitable for a task. By analyzing the intermediate feature maps of architecture, a particular layer can be more perturbed to improve the transferability. The performance is further enhanced by considering diverse input patterns. Experimental results demonstrate the success in increasing the transferability of our proposition.
尽管预测精度很高,但深度网络很容易受到对抗性攻击,其设计方法是对清洁图像引入人类无法察觉的扰动。因此,对抗性样本可能会误导已经训练好的深度网络。生成对抗性示例的过程可以帮助我们研究不同模型的鲁棒性。许多成熟的对抗性攻击往往在具有挑战性的黑盒设置下失败。因此,需要提高对抗性攻击对未知模型的可转移性。在这方面,我们建议通过在建筑的几个中间层中引入线性来提高可转移性的速率。拟议的设计不会对原有建筑造成太大的干扰。该设计侧重于中间层在生成适合任务的特征映射中的重要性。通过分析体系结构的中间特征映射,可以对某一层进行扰动,提高可移植性。通过考虑多种输入模式,进一步提高了性能。实验结果表明,该方法成功地提高了该命题的可转移性。
{"title":"Increasing Transferability by Imposing Linearity and Perturbation in Intermediate Layer with Diverse Input Patterns","authors":"Meet Shah, Srimanta Mandal, Shruti Bhilare, Avik Hati Dhirubhai","doi":"10.1109/SPCOM55316.2022.9840512","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840512","url":null,"abstract":"Despite high prediction accuracy, deep networks are vulnerable to adversarial attacks, designed by inducing human-indiscernible perturbations to clean images. Hence, adversarial samples can mislead already trained deep networks. The process of generating adversarial examples can assist us in investigating the robustness of different models. Many developed adversarial attacks often fail under challenging black-box settings. Hence, it is required to improve transferability of adversarial attacks to an unknown model. In this aspect, we propose to increase the rate of transferability by inducing linearity in a few intermediate layers of architecture. The proposed design does not disturb the original architecture much. The design focuses on significance of intermediate layers in generating feature maps suitable for a task. By analyzing the intermediate feature maps of architecture, a particular layer can be more perturbed to improve the transferability. The performance is further enhanced by considering diverse input patterns. Experimental results demonstrate the success in increasing the transferability of our proposition.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124562299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Surgical Gesture Segmentation and Classification in Multi-gesture Robotic Surgery using Fine-tuned features and Calibrated MS-TCN 基于微调特征和校准MS-TCN的多手势机器人手术时间外科手势分割与分类
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840779
Snigdha Agarwal, Chakka Sai Pradeep, N. Sinha
Temporal Gesture Segmentation is an active research problem for many applications such as surgical skill assessment, surgery training, robotic training. In this paper, we propose a novel method for Gesture Segmentation on untrimmed surgical videos of the challenging JIGSAWS dataset by using a two-step methodology. We train and evaluate our method on 39 videos of the Suturing task which has 10 gestures. The length of gestures ranges from 1 second to 75 seconds and full video length varies from 1 minute to 5 minutes. In step one, we extract encoded frame-wise spatio-temporal features on full temporal resolution of the untrimmed videos. In step two, we use these extracted features to identify gesture segments for temporal segmentation and classification. To extract high-quality features from the surgical videos, we also pre-train gesture classification models using transfer learning on the JIGSAWS dataset using two state-of-the-art pretrained backbone architectures. For segmentation, we propose an improved calibrated MS-TCN (CMS-TCN) by introducing a smoothed focal loss as loss function which helps in regularizing our TCN to avoid making over-confident decisions. We achieve a frame-wise accuracy of 89.8% and an Edit Distance score of 91.5%, an improvement of 2.2% from previous works. We also propose a novel evaluation metric that normalizes the effect of correctly classifying the frames of larger segments versus smaller segments in a single score.
在手术技能评估、手术训练、机器人训练等领域,时间手势分割是一个非常活跃的研究课题。在本文中,我们提出了一种新的方法,使用两步法对具有挑战性的JIGSAWS数据集的未修剪手术视频进行手势分割。我们在39个有10个手势的缝合任务视频上训练和评估了我们的方法。手势时长为1秒~ 75秒,完整视频时长为1分钟~ 5分钟。在第一步中,我们在全时间分辨率下提取未修剪视频的编码帧-时空特征。在第二步,我们使用这些提取的特征来识别手势片段进行时间分割和分类。为了从手术视频中提取高质量的特征,我们还使用两种最先进的预训练骨干架构,在JIGSAWS数据集上使用迁移学习预训练手势分类模型。对于分割,我们提出了一种改进的校准MS-TCN (CMS-TCN),通过引入平滑的焦损失作为损失函数,有助于正则化我们的TCN,以避免做出过度自信的决策。我们实现了89.8%的帧精度和91.5%的编辑距离得分,比以前的工作提高了2.2%。我们还提出了一种新的评估指标,该指标规范了在单个分数中正确分类较大片段与较小片段的帧的效果。
{"title":"Temporal Surgical Gesture Segmentation and Classification in Multi-gesture Robotic Surgery using Fine-tuned features and Calibrated MS-TCN","authors":"Snigdha Agarwal, Chakka Sai Pradeep, N. Sinha","doi":"10.1109/SPCOM55316.2022.9840779","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840779","url":null,"abstract":"Temporal Gesture Segmentation is an active research problem for many applications such as surgical skill assessment, surgery training, robotic training. In this paper, we propose a novel method for Gesture Segmentation on untrimmed surgical videos of the challenging JIGSAWS dataset by using a two-step methodology. We train and evaluate our method on 39 videos of the Suturing task which has 10 gestures. The length of gestures ranges from 1 second to 75 seconds and full video length varies from 1 minute to 5 minutes. In step one, we extract encoded frame-wise spatio-temporal features on full temporal resolution of the untrimmed videos. In step two, we use these extracted features to identify gesture segments for temporal segmentation and classification. To extract high-quality features from the surgical videos, we also pre-train gesture classification models using transfer learning on the JIGSAWS dataset using two state-of-the-art pretrained backbone architectures. For segmentation, we propose an improved calibrated MS-TCN (CMS-TCN) by introducing a smoothed focal loss as loss function which helps in regularizing our TCN to avoid making over-confident decisions. We achieve a frame-wise accuracy of 89.8% and an Edit Distance score of 91.5%, an improvement of 2.2% from previous works. We also propose a novel evaluation metric that normalizes the effect of correctly classifying the frames of larger segments versus smaller segments in a single score.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132781519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Computer-aided Cataract Grading Under Adversarial Environment 敌对环境下的计算机辅助白内障分级
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840821
T. Pratap, Priyanka Kokil
Cataract is the most common cause of blindness in the world. Early detection and treatment can lower the risk of cataract progression. The diagnostic performance of existing computer-aided cataract grading (CACG) methods often deteriorates due to the sophisticated image capture technology. The common retinal fundus image aberrations such as noise and blur are unavoidable in practice. In this paper, a CACG method is proposed to achieve robust cataract grading under adversarial conditions such as noise and blur. The presented CACG method is designed using three deep neural network variants. Each variant is fine-tuned individually using good, noisy, and blur retinal fundus images to achieve optimum performance. Further, the input image quality detection module is incorporated in the proposed CACG method to detect input image distortion and then pivots the input image to the desired deep neural network variant. Gaussian noise and blur models are used to evaluate the effectiveness of the suggested CACG method. The proposed CACG approach exhibits superior performance to existing methods under adversarial conditions.
白内障是世界上最常见的致盲原因。早期发现和治疗可以降低白内障发展的风险。由于图像采集技术复杂,现有的计算机辅助白内障分级(CACG)方法的诊断性能往往下降。在实际应用中,常见的视网膜眼底图像像差如噪声、模糊等是不可避免的。为了在噪声和模糊等不利条件下实现对白内障的鲁棒分级,本文提出了一种CACG方法。该方法采用三种深度神经网络变体进行设计。每个变体是微调单独使用良好的,嘈杂的,模糊的视网膜眼底图像,以达到最佳性能。此外,在所提出的CACG方法中加入了输入图像质量检测模块,以检测输入图像失真,然后将输入图像枢轴到所需的深度神经网络变体中。利用高斯噪声和模糊模型来评价所提出的CACG方法的有效性。所提出的CACG方法在对抗条件下表现出优于现有方法的性能。
{"title":"Computer-aided Cataract Grading Under Adversarial Environment","authors":"T. Pratap, Priyanka Kokil","doi":"10.1109/SPCOM55316.2022.9840821","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840821","url":null,"abstract":"Cataract is the most common cause of blindness in the world. Early detection and treatment can lower the risk of cataract progression. The diagnostic performance of existing computer-aided cataract grading (CACG) methods often deteriorates due to the sophisticated image capture technology. The common retinal fundus image aberrations such as noise and blur are unavoidable in practice. In this paper, a CACG method is proposed to achieve robust cataract grading under adversarial conditions such as noise and blur. The presented CACG method is designed using three deep neural network variants. Each variant is fine-tuned individually using good, noisy, and blur retinal fundus images to achieve optimum performance. Further, the input image quality detection module is incorporated in the proposed CACG method to detect input image distortion and then pivots the input image to the desired deep neural network variant. Gaussian noise and blur models are used to evaluate the effectiveness of the suggested CACG method. The proposed CACG approach exhibits superior performance to existing methods under adversarial conditions.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129220025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A unified neural MRA architecture combining wavelet CNN and wavelet pooling for texture classification 结合小波CNN和小波池的纹理分类统一神经MRA架构
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840760
K. K. Tarafdar, Q. Saifee, V. Gadre
This paper introduces a novel unified neural Multi-Resolution Analysis (MRA) architecture that uses Discrete Wavelet Transform (DWT) integrated Convolutional Neural Network (CNN) along with DWT pooling. As convolution with pooling operation in CNN has equivalence with filtering and downsampling operation in a DWT filter bank, both are unified to form an end-to-end deep learning wavelet CNN model. The DWT pooling mechanism is also used to further enhance the MRA capability of this wavelet CNN. Using the first two wavelets of the Daubechies family, we present here a comprehensive set of improved texture classification results with several updates in the model architecture. These updates in the CNN model architecture apply to any node generally associated with the time-frequency analysis of the input signal.
本文介绍了一种新的统一神经多分辨率分析(MRA)架构,该架构采用离散小波变换(DWT)集成卷积神经网络(CNN)和DWT池。由于CNN中具有池化操作的卷积与DWT滤波器组中的滤波和下采样操作是等价的,因此将两者统一起来,形成端到端的深度学习小波CNN模型。利用DWT池化机制进一步增强了小波CNN的MRA能力。利用Daubechies家族的前两个小波,我们提出了一套全面改进的纹理分类结果,并在模型架构上进行了一些更新。CNN模型架构中的这些更新适用于通常与输入信号的时频分析相关的任何节点。
{"title":"A unified neural MRA architecture combining wavelet CNN and wavelet pooling for texture classification","authors":"K. K. Tarafdar, Q. Saifee, V. Gadre","doi":"10.1109/SPCOM55316.2022.9840760","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840760","url":null,"abstract":"This paper introduces a novel unified neural Multi-Resolution Analysis (MRA) architecture that uses Discrete Wavelet Transform (DWT) integrated Convolutional Neural Network (CNN) along with DWT pooling. As convolution with pooling operation in CNN has equivalence with filtering and downsampling operation in a DWT filter bank, both are unified to form an end-to-end deep learning wavelet CNN model. The DWT pooling mechanism is also used to further enhance the MRA capability of this wavelet CNN. Using the first two wavelets of the Daubechies family, we present here a comprehensive set of improved texture classification results with several updates in the model architecture. These updates in the CNN model architecture apply to any node generally associated with the time-frequency analysis of the input signal.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123038488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morse Wavelet Features for Pop Noise Detection 莫尔斯小波特征的流行噪声检测
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840840
Priyanka Gupta, Piyushkumar K. Chodingala, H. Patil
Spoofed Speech Detection (SSD) problem has been an important problem, especially for Automatic Speaker Verification (ASV) systems. However, the techniques used for designing countermeasure systems for SSD task are attack-specific, and therefore the solutions are far from a generalized SSD system, which can detect any type of spoofed speech. On the other hand, Voice Liveness Detection (VLD) systems rely on the characteristics of live speech (i.e., pop noise) to detect whether an utterance is live or not. Given that the attacker has the freedom to mount any type of attack, VLD systems play a crucial role in defending against spoofing attacks, irrespective of the type of spoof used by the attacker. To that effect, we propose Generalized Morse Wavelet (GMW)-based features for VLD, with Convolutional Neural Network (CNN) as the classifier at the back-end. In this context, we use pop noise as a discriminative acoustic cue to detect live speech. Pop noise is present in live speech signals at low frequencies (typically $leq 40$ Hz), caused by human breath reaching at the closely-placed microphone. We show that for $gamma =3$, the Morse wavelet has the highest concentration of information denoted by the least area of the Heisenberg’s box. Hence, we take $gamma =3$ for our experiments on Morse wavelets. We compare the performance of our system with Short-Time Fourier Transform (STFT)-Support Vector Machine (SVM)-based original baseline, and other existing systems, such as Constant Q-Transform (CQT)-SVM, STFT-CNN, and bump wavelet-CNN. With overall accuracy of 86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.
欺骗语音检测(SSD)问题一直是一个重要的问题,特别是在自动说话人验证(ASV)系统中。然而,用于设计SSD任务的对抗系统的技术是针对特定攻击的,因此解决方案与可以检测任何类型的欺骗语音的通用SSD系统相去甚远。另一方面,语音活性检测(VLD)系统依赖于实时语音的特征(即流行噪声)来检测话语是否实时。鉴于攻击者可以自由地发起任何类型的攻击,VLD系统在防御欺骗攻击方面发挥着至关重要的作用,而不管攻击者使用哪种类型的欺骗。为此,我们提出了基于广义莫尔斯小波(GMW)的VLD特征,并将卷积神经网络(CNN)作为后端分类器。在这种情况下,我们使用流行噪音作为判别声学线索来检测现场语音。流行噪声存在于低频率(通常为$leq 40$ Hz)的实时语音信号中,是由人的呼吸到达靠近的麦克风引起的。我们表明,对于$gamma =3$,莫尔斯小波具有最高的信息集中度,由海森堡盒子的最小面积表示。因此,我们选取$gamma =3$作为摩尔斯小波的实验。我们将系统的性能与基于短时傅立叶变换(STFT)-支持向量机(SVM)的原始基线,以及其他现有系统(如常数q变换(CQT)-SVM, STFT- cnn和bump wavelet-CNN)进行了比较。总体准确率为86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.
{"title":"Morse Wavelet Features for Pop Noise Detection","authors":"Priyanka Gupta, Piyushkumar K. Chodingala, H. Patil","doi":"10.1109/SPCOM55316.2022.9840840","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840840","url":null,"abstract":"Spoofed Speech Detection (SSD) problem has been an important problem, especially for Automatic Speaker Verification (ASV) systems. However, the techniques used for designing countermeasure systems for SSD task are attack-specific, and therefore the solutions are far from a generalized SSD system, which can detect any type of spoofed speech. On the other hand, Voice Liveness Detection (VLD) systems rely on the characteristics of live speech (i.e., pop noise) to detect whether an utterance is live or not. Given that the attacker has the freedom to mount any type of attack, VLD systems play a crucial role in defending against spoofing attacks, irrespective of the type of spoof used by the attacker. To that effect, we propose Generalized Morse Wavelet (GMW)-based features for VLD, with Convolutional Neural Network (CNN) as the classifier at the back-end. In this context, we use pop noise as a discriminative acoustic cue to detect live speech. Pop noise is present in live speech signals at low frequencies (typically $leq 40$ Hz), caused by human breath reaching at the closely-placed microphone. We show that for $gamma =3$, the Morse wavelet has the highest concentration of information denoted by the least area of the Heisenberg’s box. Hence, we take $gamma =3$ for our experiments on Morse wavelets. We compare the performance of our system with Short-Time Fourier Transform (STFT)-Support Vector Machine (SVM)-based original baseline, and other existing systems, such as Constant Q-Transform (CQT)-SVM, STFT-CNN, and bump wavelet-CNN. With overall accuracy of 86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124847403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Binary Intelligent Reflecting Surfaces Assisted OFDM Systems 二元智能反射面辅助OFDM系统
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840791
L. Yashvanth, C. Murthy, B. Deepak
Intelligent reflecting surfaces (IRSs) enhance the performance of wireless systems by reflecting the incoming signals towards a desired user, especially in the mmWave bands. However, this requires optimizing the discrete reflection coefficients of the IRS elements, which crucially depends on the availability of accurate channel state information (CSI) of all links in the system. Further, in wideband systems employing orthogonal frequency division multiplexing (OFDM), a given IRS configuration cannot be simultaneously optimal for all the subcarriers, and hence the phase optimization is not straightforward. In this paper, we propose a novel IRS phase configuration scheme in OFDM systems by first leveraging the sparsity of the channel in the angular domain to estimate the CSI using simultaneous orthogonal matching pursuit (SOMP) algorithm, and then devising a novel and computationally efficient binary IRS phase configuration algorithm using majorization-minimization (MM). Simulation results illustrate the efficacy of the approach in comparison with the state-of-the-art.
智能反射面(IRSs)通过将输入信号反射到期望的用户来增强无线系统的性能,特别是在毫米波频段。然而,这需要优化IRS元件的离散反射系数,这关键取决于系统中所有链路的准确信道状态信息(CSI)的可用性。此外,在采用正交频分复用(OFDM)的宽带系统中,给定的IRS配置不能同时对所有子载波进行优化,因此相位优化并不直接。在本文中,我们提出了一种新的OFDM系统的IRS相位配置方案,首先利用信道在角域中的稀疏性,利用同步正交匹配追踪(SOMP)算法来估计CSI,然后设计了一种新的计算效率高的二元IRS相位配置算法,使用最大化-最小化(MM)。仿真结果表明了该方法与现有方法的有效性。
{"title":"Binary Intelligent Reflecting Surfaces Assisted OFDM Systems","authors":"L. Yashvanth, C. Murthy, B. Deepak","doi":"10.1109/SPCOM55316.2022.9840791","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840791","url":null,"abstract":"Intelligent reflecting surfaces (IRSs) enhance the performance of wireless systems by reflecting the incoming signals towards a desired user, especially in the mmWave bands. However, this requires optimizing the discrete reflection coefficients of the IRS elements, which crucially depends on the availability of accurate channel state information (CSI) of all links in the system. Further, in wideband systems employing orthogonal frequency division multiplexing (OFDM), a given IRS configuration cannot be simultaneously optimal for all the subcarriers, and hence the phase optimization is not straightforward. In this paper, we propose a novel IRS phase configuration scheme in OFDM systems by first leveraging the sparsity of the channel in the angular domain to estimate the CSI using simultaneous orthogonal matching pursuit (SOMP) algorithm, and then devising a novel and computationally efficient binary IRS phase configuration algorithm using majorization-minimization (MM). Simulation results illustrate the efficacy of the approach in comparison with the state-of-the-art.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C-Band Iris Coupled Cavity Bandpass Filter c波段虹膜耦合腔带通滤波器
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840777
Shashank Soi, Sudheer Kumar Singh, Rajendra Singh, Ashok Kumar
This paper presents the design of compact, tunable, high rejection 6th-order C-Band Iris Coupled Cavity Bandpass Filter. The design approach followed includes the use of Chebychev low pass filter prototype elements to calculate normalized capacitance per unit length between resonators & ground and also between adjacent resonators. With the help of coupling and tuning screws, the bandwidth and center frequency of the filter can be tuned for desired performance. Coaxial capacitance formula is used to compute the diameter of the screws. CST tool is used to simulate & optimize the theoretically calculate physical dimensions to further improve the filter performance and obtain better tolerance sensitivity. Finally, a 6th order prototype is fabricated and tuned to obtain the desired performance. The cavity design & resonator calculations have been carried out in such a manner that the same hardware can be tuned to both the frequency bands i.e., 4.4-4.6 GHz (Band I) and 4.8-5.0 GHz (Band II) to meet the desired specifications. A prototype is fabricated and experimental validation is presented.
本文设计了一种紧凑、可调谐、高抑制的6阶c波段虹膜耦合腔带通滤波器。接下来的设计方法包括使用切比切夫低通滤波器原型元件来计算谐振器与地之间以及相邻谐振器之间单位长度的归一化电容。在耦合和调谐螺丝的帮助下,滤波器的带宽和中心频率可以调谐到所需的性能。采用同轴电容公式计算螺杆直径。利用CST工具对理论计算的物理尺寸进行模拟优化,进一步提高滤波器性能,获得更好的公差灵敏度。最后,制作了一个六阶原型,并对其进行了调谐以获得期望的性能。腔体设计和谐振器计算是以这样一种方式进行的,即相同的硬件可以调谐到两个频段,即4.4-4.6 GHz(频带I)和4.8-5.0 GHz(频带II),以满足所需的规格。制作了样机并进行了实验验证。
{"title":"C-Band Iris Coupled Cavity Bandpass Filter","authors":"Shashank Soi, Sudheer Kumar Singh, Rajendra Singh, Ashok Kumar","doi":"10.1109/SPCOM55316.2022.9840777","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840777","url":null,"abstract":"This paper presents the design of compact, tunable, high rejection 6th-order C-Band Iris Coupled Cavity Bandpass Filter. The design approach followed includes the use of Chebychev low pass filter prototype elements to calculate normalized capacitance per unit length between resonators & ground and also between adjacent resonators. With the help of coupling and tuning screws, the bandwidth and center frequency of the filter can be tuned for desired performance. Coaxial capacitance formula is used to compute the diameter of the screws. CST tool is used to simulate & optimize the theoretically calculate physical dimensions to further improve the filter performance and obtain better tolerance sensitivity. Finally, a 6th order prototype is fabricated and tuned to obtain the desired performance. The cavity design & resonator calculations have been carried out in such a manner that the same hardware can be tuned to both the frequency bands i.e., 4.4-4.6 GHz (Band I) and 4.8-5.0 GHz (Band II) to meet the desired specifications. A prototype is fabricated and experimental validation is presented.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"19 34","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114044168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hilbert Vibration Decomposition of Seismocardiogram for HR and HRV Estimation 用于HR和HRV估计的地震心电图希尔伯特振动分解
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840838
Moirangthem James Singh, L. Sharma, S. Dandapat
This paper presents a new time-varying decomposition method based on the Hilbert Vibration Decomposition (HVD) for estimating heart rate from a Seismocardiogram (SCG). The heart rate (HR) estimation method consists of signal decomposition using the HVD algorithm, heart rate envelope generation, and peak detection from the smooth envelope for beat-to-beat interval calculation. We derived the heart rate variability (HRV) metrics from the interbeat intervals. The method doesn’t require a reference ECG signal. The same signals are also subjected to Empirical Mode Decomposition (EMD) and Variational Mode Decomposition (VMD) methods. To compare these three decomposition methods, the CEBS database from Physionet Archive was used for testing and validation. The results show better accuracy in beat-to-beat interval estimation using the HVD method than others, and HRV metrics are accurately derived using our methodology. The performance results demonstrate that the standard ECG-derived heartbeats and HRV metrics and SCG-derived heartbeats and HRV metrics are comparable for healthy subjects.
提出了一种基于希尔伯特振动分解(HVD)的时变心率估计方法。心率(HR)估计方法包括利用HVD算法对信号进行分解,生成心率包络线,并从平滑包络线中检测峰值以计算心跳间隔。我们从搏动间隔中推导出心率变异性(HRV)指标。该方法不需要参考心电信号。同样的信号也经受了经验模态分解(EMD)和变分模态分解(VMD)方法。为了比较这三种分解方法,我们使用了来自Physionet Archive的CEBS数据库进行测试和验证。结果表明,使用HVD方法估算节拍间隔的准确性比其他方法更高,并且使用我们的方法可以准确地推导出HRV指标。性能结果表明,标准心电图得出的心跳和HRV指标与scg得出的心跳和HRV指标在健康受试者中具有可比性。
{"title":"Hilbert Vibration Decomposition of Seismocardiogram for HR and HRV Estimation","authors":"Moirangthem James Singh, L. Sharma, S. Dandapat","doi":"10.1109/SPCOM55316.2022.9840838","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840838","url":null,"abstract":"This paper presents a new time-varying decomposition method based on the Hilbert Vibration Decomposition (HVD) for estimating heart rate from a Seismocardiogram (SCG). The heart rate (HR) estimation method consists of signal decomposition using the HVD algorithm, heart rate envelope generation, and peak detection from the smooth envelope for beat-to-beat interval calculation. We derived the heart rate variability (HRV) metrics from the interbeat intervals. The method doesn’t require a reference ECG signal. The same signals are also subjected to Empirical Mode Decomposition (EMD) and Variational Mode Decomposition (VMD) methods. To compare these three decomposition methods, the CEBS database from Physionet Archive was used for testing and validation. The results show better accuracy in beat-to-beat interval estimation using the HVD method than others, and HRV metrics are accurately derived using our methodology. The performance results demonstrate that the standard ECG-derived heartbeats and HRV metrics and SCG-derived heartbeats and HRV metrics are comparable for healthy subjects.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115669859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2022 IEEE International Conference on Signal Processing and Communications (SPCOM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1