首页 > 最新文献

2022 IEEE International Conference on Signal Processing and Communications (SPCOM)最新文献

英文 中文
SPCOM 2022 Cover Page SPCOM 2022封面
Pub Date : 2022-07-11 DOI: 10.1109/spcom55316.2022.9840800
{"title":"SPCOM 2022 Cover Page","authors":"","doi":"10.1109/spcom55316.2022.9840800","DOIUrl":"https://doi.org/10.1109/spcom55316.2022.9840800","url":null,"abstract":"","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122429292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-level Bias discovery and Mitigation for Image Classification 图像分类中的低水平偏差发现与消除
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840811
Vartika Sengar, S. VivekB., Gaurab Bhattacharya, J. Gubbi, Arpan Pal, P. Balamuralidhar
Identification of bias and its mitigation in a classifier is a fundamental sanity check required in trustworthy AI systems. There have been many methods for mitigation of bias in literature that use bias as apriori information. In this work, we propose a system that can detect the low-level bias (e.g., color, texture) and mitigate the same. A novel auto-encoder architecture to explain the predictions made by a deep neural network is built that helps in identification of the bias. The auto-encoder is trained to produce a generalized representation of the input image by decomposing it into a set of latent embeddings. These embeddings are learned by specializing the group of higher dimensional feature maps to learn the disentangled color and shape concepts. The shape embeddings are trained to reconstruct discrete wavelet transform components of an image and the color embeddings are trained to capture the color information. The feature specialization is done by reconstructing the RGB image using the shape embeddings modulated by color embeddings. We have shown that these representations can be used to detect low level bias in a classification task. Post detection of bias, we also propose a method to de-bias the classifier by training it with counterfactual images generated by manipulating the representations learned by the auto-encoder. We have shown that our proposed method of bias discovery and mitigation is able to achieve state-of-the-art results on ColorMNIST and the newly proposed BiasedShape dataset.
在分类器中识别偏见并减轻其影响是值得信赖的人工智能系统所需的基本完整性检查。在文献中,有许多方法可以利用偏见作为先验信息来减轻偏见。在这项工作中,我们提出了一个可以检测低水平偏差(例如,颜色,纹理)并减轻其影响的系统。一种新的自编码器架构来解释由深度神经网络做出的预测,有助于识别偏差。训练自编码器通过将输入图像分解成一组潜在嵌入来产生输入图像的广义表示。这些嵌入是通过专门化高维特征映射来学习解纠缠的颜色和形状概念来学习的。训练形状嵌入来重建图像的离散小波变换分量,训练颜色嵌入来捕获图像的颜色信息。特征专门化是通过颜色嵌入调制的形状嵌入重构RGB图像来实现的。我们已经证明,这些表征可以用来检测分类任务中的低水平偏差。在检测到偏差后,我们还提出了一种方法,通过操纵自编码器学习的表示生成的反事实图像来训练分类器来消除偏差。我们已经证明,我们提出的偏差发现和缓解方法能够在ColorMNIST和新提出的BiasedShape数据集上获得最先进的结果。
{"title":"Low-level Bias discovery and Mitigation for Image Classification","authors":"Vartika Sengar, S. VivekB., Gaurab Bhattacharya, J. Gubbi, Arpan Pal, P. Balamuralidhar","doi":"10.1109/SPCOM55316.2022.9840811","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840811","url":null,"abstract":"Identification of bias and its mitigation in a classifier is a fundamental sanity check required in trustworthy AI systems. There have been many methods for mitigation of bias in literature that use bias as apriori information. In this work, we propose a system that can detect the low-level bias (e.g., color, texture) and mitigate the same. A novel auto-encoder architecture to explain the predictions made by a deep neural network is built that helps in identification of the bias. The auto-encoder is trained to produce a generalized representation of the input image by decomposing it into a set of latent embeddings. These embeddings are learned by specializing the group of higher dimensional feature maps to learn the disentangled color and shape concepts. The shape embeddings are trained to reconstruct discrete wavelet transform components of an image and the color embeddings are trained to capture the color information. The feature specialization is done by reconstructing the RGB image using the shape embeddings modulated by color embeddings. We have shown that these representations can be used to detect low level bias in a classification task. Post detection of bias, we also propose a method to de-bias the classifier by training it with counterfactual images generated by manipulating the representations learned by the auto-encoder. We have shown that our proposed method of bias discovery and mitigation is able to achieve state-of-the-art results on ColorMNIST and the newly proposed BiasedShape dataset.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Temporal Surgical Gesture Segmentation and Classification in Multi-gesture Robotic Surgery using Fine-tuned features and Calibrated MS-TCN 基于微调特征和校准MS-TCN的多手势机器人手术时间外科手势分割与分类
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840779
Snigdha Agarwal, Chakka Sai Pradeep, N. Sinha
Temporal Gesture Segmentation is an active research problem for many applications such as surgical skill assessment, surgery training, robotic training. In this paper, we propose a novel method for Gesture Segmentation on untrimmed surgical videos of the challenging JIGSAWS dataset by using a two-step methodology. We train and evaluate our method on 39 videos of the Suturing task which has 10 gestures. The length of gestures ranges from 1 second to 75 seconds and full video length varies from 1 minute to 5 minutes. In step one, we extract encoded frame-wise spatio-temporal features on full temporal resolution of the untrimmed videos. In step two, we use these extracted features to identify gesture segments for temporal segmentation and classification. To extract high-quality features from the surgical videos, we also pre-train gesture classification models using transfer learning on the JIGSAWS dataset using two state-of-the-art pretrained backbone architectures. For segmentation, we propose an improved calibrated MS-TCN (CMS-TCN) by introducing a smoothed focal loss as loss function which helps in regularizing our TCN to avoid making over-confident decisions. We achieve a frame-wise accuracy of 89.8% and an Edit Distance score of 91.5%, an improvement of 2.2% from previous works. We also propose a novel evaluation metric that normalizes the effect of correctly classifying the frames of larger segments versus smaller segments in a single score.
在手术技能评估、手术训练、机器人训练等领域,时间手势分割是一个非常活跃的研究课题。在本文中,我们提出了一种新的方法,使用两步法对具有挑战性的JIGSAWS数据集的未修剪手术视频进行手势分割。我们在39个有10个手势的缝合任务视频上训练和评估了我们的方法。手势时长为1秒~ 75秒,完整视频时长为1分钟~ 5分钟。在第一步中,我们在全时间分辨率下提取未修剪视频的编码帧-时空特征。在第二步,我们使用这些提取的特征来识别手势片段进行时间分割和分类。为了从手术视频中提取高质量的特征,我们还使用两种最先进的预训练骨干架构,在JIGSAWS数据集上使用迁移学习预训练手势分类模型。对于分割,我们提出了一种改进的校准MS-TCN (CMS-TCN),通过引入平滑的焦损失作为损失函数,有助于正则化我们的TCN,以避免做出过度自信的决策。我们实现了89.8%的帧精度和91.5%的编辑距离得分,比以前的工作提高了2.2%。我们还提出了一种新的评估指标,该指标规范了在单个分数中正确分类较大片段与较小片段的帧的效果。
{"title":"Temporal Surgical Gesture Segmentation and Classification in Multi-gesture Robotic Surgery using Fine-tuned features and Calibrated MS-TCN","authors":"Snigdha Agarwal, Chakka Sai Pradeep, N. Sinha","doi":"10.1109/SPCOM55316.2022.9840779","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840779","url":null,"abstract":"Temporal Gesture Segmentation is an active research problem for many applications such as surgical skill assessment, surgery training, robotic training. In this paper, we propose a novel method for Gesture Segmentation on untrimmed surgical videos of the challenging JIGSAWS dataset by using a two-step methodology. We train and evaluate our method on 39 videos of the Suturing task which has 10 gestures. The length of gestures ranges from 1 second to 75 seconds and full video length varies from 1 minute to 5 minutes. In step one, we extract encoded frame-wise spatio-temporal features on full temporal resolution of the untrimmed videos. In step two, we use these extracted features to identify gesture segments for temporal segmentation and classification. To extract high-quality features from the surgical videos, we also pre-train gesture classification models using transfer learning on the JIGSAWS dataset using two state-of-the-art pretrained backbone architectures. For segmentation, we propose an improved calibrated MS-TCN (CMS-TCN) by introducing a smoothed focal loss as loss function which helps in regularizing our TCN to avoid making over-confident decisions. We achieve a frame-wise accuracy of 89.8% and an Edit Distance score of 91.5%, an improvement of 2.2% from previous works. We also propose a novel evaluation metric that normalizes the effect of correctly classifying the frames of larger segments versus smaller segments in a single score.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132781519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Hierarchical Approach for Decoding Human Reach-and-Grasp Activities based on EEG Signals 一种基于脑电图信号的人类伸手抓握活动分层解码方法
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840794
Bhagyasree Kanuparthi, A. Turlapaty
Physically disabled patients such as the paralyzed, amputees and stroke patients find it difficult to perform daily activities on their own. A Brain-Computer Interface (BCI) using Electroencephalography (EEG) signals is an option for the rehabilitation of these patients. The BCI function can be enhanced by decoding the movements from a limb through an intuitive control of the prosthetic arm. However, decoding them with the traditional classifiers is a challenging task. In this paper, a two-stage hierarchical framework is proposed for the decoding of reach-and-grasp actions. In stage-l, the action signals are separated from rest segments based on power spectral density features and a fine k-nearest neighbor classifier (FKNN). In stage-2, the signals identified as action are further classified into palmar and lateral type reach-and-grasp actions using the mean absolute value features with the FKNN classifier. In comparison with the existing classifiers, the proposed method has a superior performance of 85.38% test accuracy.
肢体残疾患者,如瘫痪、截肢和中风患者,很难独立完成日常活动。使用脑电图(EEG)信号的脑机接口(BCI)是这些患者康复的一种选择。BCI功能可以通过对假肢手臂的直观控制来解码肢体的运动来增强。然而,用传统的分类器对它们进行解码是一项具有挑战性的任务。本文提出了一种两阶段层次结构的抓取动作解码框架。在阶段1中,基于功率谱密度特征和精细k近邻分类器(FKNN)将动作信号从休息段中分离出来。在阶段2中,使用FKNN分类器的均值绝对值特征,将识别为动作的信号进一步分类为掌型和侧型伸手抓握动作。与现有分类器相比,该方法具有85.38%的测试准确率。
{"title":"A Hierarchical Approach for Decoding Human Reach-and-Grasp Activities based on EEG Signals","authors":"Bhagyasree Kanuparthi, A. Turlapaty","doi":"10.1109/SPCOM55316.2022.9840794","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840794","url":null,"abstract":"Physically disabled patients such as the paralyzed, amputees and stroke patients find it difficult to perform daily activities on their own. A Brain-Computer Interface (BCI) using Electroencephalography (EEG) signals is an option for the rehabilitation of these patients. The BCI function can be enhanced by decoding the movements from a limb through an intuitive control of the prosthetic arm. However, decoding them with the traditional classifiers is a challenging task. In this paper, a two-stage hierarchical framework is proposed for the decoding of reach-and-grasp actions. In stage-l, the action signals are separated from rest segments based on power spectral density features and a fine k-nearest neighbor classifier (FKNN). In stage-2, the signals identified as action are further classified into palmar and lateral type reach-and-grasp actions using the mean absolute value features with the FKNN classifier. In comparison with the existing classifiers, the proposed method has a superior performance of 85.38% test accuracy.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AbS for ASR: A New Computational Perspective ASR的AbS:一个新的计算视角
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840830
V. R. Lakkavalli
In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.
为了提高自动语音识别的性能,本文重新审视了经典的合成分析(AbS)方法。尽管AbS范式有望解释运动理论中提出的感知过程,但要实现基于它的实际ASR系统,仍有许多挑战有待解决。本文提出了一种基于AbS的ASR通用架构;ii)提出了一种新的AbS-格架,用于实现考虑迁移(协发音)代价和分类代价相结合的AbS环路,以搜索最佳解码路径。在TIMIT数据库上的初步结果表明,使用AbS可以减少替代误差。这表明在ASR中使用AbS是有希望的,结果进一步强调了确定不变的语音表示空间、更好的距离度量(或协发音建模)和合成器的必要性。
{"title":"AbS for ASR: A New Computational Perspective","authors":"V. R. Lakkavalli","doi":"10.1109/SPCOM55316.2022.9840830","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840830","url":null,"abstract":"In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115298717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morse Wavelet Features for Pop Noise Detection 莫尔斯小波特征的流行噪声检测
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840840
Priyanka Gupta, Piyushkumar K. Chodingala, H. Patil
Spoofed Speech Detection (SSD) problem has been an important problem, especially for Automatic Speaker Verification (ASV) systems. However, the techniques used for designing countermeasure systems for SSD task are attack-specific, and therefore the solutions are far from a generalized SSD system, which can detect any type of spoofed speech. On the other hand, Voice Liveness Detection (VLD) systems rely on the characteristics of live speech (i.e., pop noise) to detect whether an utterance is live or not. Given that the attacker has the freedom to mount any type of attack, VLD systems play a crucial role in defending against spoofing attacks, irrespective of the type of spoof used by the attacker. To that effect, we propose Generalized Morse Wavelet (GMW)-based features for VLD, with Convolutional Neural Network (CNN) as the classifier at the back-end. In this context, we use pop noise as a discriminative acoustic cue to detect live speech. Pop noise is present in live speech signals at low frequencies (typically $leq 40$ Hz), caused by human breath reaching at the closely-placed microphone. We show that for $gamma =3$, the Morse wavelet has the highest concentration of information denoted by the least area of the Heisenberg’s box. Hence, we take $gamma =3$ for our experiments on Morse wavelets. We compare the performance of our system with Short-Time Fourier Transform (STFT)-Support Vector Machine (SVM)-based original baseline, and other existing systems, such as Constant Q-Transform (CQT)-SVM, STFT-CNN, and bump wavelet-CNN. With overall accuracy of 86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.
欺骗语音检测(SSD)问题一直是一个重要的问题,特别是在自动说话人验证(ASV)系统中。然而,用于设计SSD任务的对抗系统的技术是针对特定攻击的,因此解决方案与可以检测任何类型的欺骗语音的通用SSD系统相去甚远。另一方面,语音活性检测(VLD)系统依赖于实时语音的特征(即流行噪声)来检测话语是否实时。鉴于攻击者可以自由地发起任何类型的攻击,VLD系统在防御欺骗攻击方面发挥着至关重要的作用,而不管攻击者使用哪种类型的欺骗。为此,我们提出了基于广义莫尔斯小波(GMW)的VLD特征,并将卷积神经网络(CNN)作为后端分类器。在这种情况下,我们使用流行噪音作为判别声学线索来检测现场语音。流行噪声存在于低频率(通常为$leq 40$ Hz)的实时语音信号中,是由人的呼吸到达靠近的麦克风引起的。我们表明,对于$gamma =3$,莫尔斯小波具有最高的信息集中度,由海森堡盒子的最小面积表示。因此,我们选取$gamma =3$作为摩尔斯小波的实验。我们将系统的性能与基于短时傅立叶变换(STFT)-支持向量机(SVM)的原始基线,以及其他现有系统(如常数q变换(CQT)-SVM, STFT- cnn和bump wavelet-CNN)进行了比较。总体准确率为86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.
{"title":"Morse Wavelet Features for Pop Noise Detection","authors":"Priyanka Gupta, Piyushkumar K. Chodingala, H. Patil","doi":"10.1109/SPCOM55316.2022.9840840","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840840","url":null,"abstract":"Spoofed Speech Detection (SSD) problem has been an important problem, especially for Automatic Speaker Verification (ASV) systems. However, the techniques used for designing countermeasure systems for SSD task are attack-specific, and therefore the solutions are far from a generalized SSD system, which can detect any type of spoofed speech. On the other hand, Voice Liveness Detection (VLD) systems rely on the characteristics of live speech (i.e., pop noise) to detect whether an utterance is live or not. Given that the attacker has the freedom to mount any type of attack, VLD systems play a crucial role in defending against spoofing attacks, irrespective of the type of spoof used by the attacker. To that effect, we propose Generalized Morse Wavelet (GMW)-based features for VLD, with Convolutional Neural Network (CNN) as the classifier at the back-end. In this context, we use pop noise as a discriminative acoustic cue to detect live speech. Pop noise is present in live speech signals at low frequencies (typically $leq 40$ Hz), caused by human breath reaching at the closely-placed microphone. We show that for $gamma =3$, the Morse wavelet has the highest concentration of information denoted by the least area of the Heisenberg’s box. Hence, we take $gamma =3$ for our experiments on Morse wavelets. We compare the performance of our system with Short-Time Fourier Transform (STFT)-Support Vector Machine (SVM)-based original baseline, and other existing systems, such as Constant Q-Transform (CQT)-SVM, STFT-CNN, and bump wavelet-CNN. With overall accuracy of 86.90% on evaluation set, our proposed system significantly outperforms STFT-SVM-based original baseline, CQT-SVM, STFT-CNN, and bump wavelet-CNN by an absolute margin of 18.97 %, 8. 02%, 15. 09%, and 12. 21%, respectively. Finally, we have also analyzed the effect of various phoneme types on VLD system performance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124847403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty 基于认知不确定性的食品图像分层和平面分类器集成
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840761
Vishwesh Pillai, Pranav Mehar, M. Das, Deep Gupta, P. Radeva
The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.
食品图像识别问题在当今的背景下是一个至关重要的问题,因为糖尿病、肥胖和心脏病等健康状况需要不断监测一个人的饮食。为了使这一过程自动化,有几种模型可用于识别食物图像。由于大量独特的食物菜肴和各种菜系,传统的平面分类器不再表现良好。为了解决这一问题,使用了由平面分类器和层次分类器组成的预测方案,并通过对认知不确定性的分析在分类器之间进行切换。然而,使用认知不确定性数据进行预测的准确性仍然相当低。因此,本文提出了一种使用三种不同阈值准则的预测方案,有助于提高认知不确定性预测的准确性。在maood -121数据集上进行了多次实验,验证了该方法的性能。实验结果验证了该方法的性能,并表明所提出的阈值准则通过正确分类样本的不确定性分布,有助于提高预测的整体精度。
{"title":"Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty","authors":"Vishwesh Pillai, Pranav Mehar, M. Das, Deep Gupta, P. Radeva","doi":"10.1109/SPCOM55316.2022.9840761","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840761","url":null,"abstract":"The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
C-Band Iris Coupled Cavity Bandpass Filter c波段虹膜耦合腔带通滤波器
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840777
Shashank Soi, Sudheer Kumar Singh, Rajendra Singh, Ashok Kumar
This paper presents the design of compact, tunable, high rejection 6th-order C-Band Iris Coupled Cavity Bandpass Filter. The design approach followed includes the use of Chebychev low pass filter prototype elements to calculate normalized capacitance per unit length between resonators & ground and also between adjacent resonators. With the help of coupling and tuning screws, the bandwidth and center frequency of the filter can be tuned for desired performance. Coaxial capacitance formula is used to compute the diameter of the screws. CST tool is used to simulate & optimize the theoretically calculate physical dimensions to further improve the filter performance and obtain better tolerance sensitivity. Finally, a 6th order prototype is fabricated and tuned to obtain the desired performance. The cavity design & resonator calculations have been carried out in such a manner that the same hardware can be tuned to both the frequency bands i.e., 4.4-4.6 GHz (Band I) and 4.8-5.0 GHz (Band II) to meet the desired specifications. A prototype is fabricated and experimental validation is presented.
本文设计了一种紧凑、可调谐、高抑制的6阶c波段虹膜耦合腔带通滤波器。接下来的设计方法包括使用切比切夫低通滤波器原型元件来计算谐振器与地之间以及相邻谐振器之间单位长度的归一化电容。在耦合和调谐螺丝的帮助下,滤波器的带宽和中心频率可以调谐到所需的性能。采用同轴电容公式计算螺杆直径。利用CST工具对理论计算的物理尺寸进行模拟优化,进一步提高滤波器性能,获得更好的公差灵敏度。最后,制作了一个六阶原型,并对其进行了调谐以获得期望的性能。腔体设计和谐振器计算是以这样一种方式进行的,即相同的硬件可以调谐到两个频段,即4.4-4.6 GHz(频带I)和4.8-5.0 GHz(频带II),以满足所需的规格。制作了样机并进行了实验验证。
{"title":"C-Band Iris Coupled Cavity Bandpass Filter","authors":"Shashank Soi, Sudheer Kumar Singh, Rajendra Singh, Ashok Kumar","doi":"10.1109/SPCOM55316.2022.9840777","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840777","url":null,"abstract":"This paper presents the design of compact, tunable, high rejection 6th-order C-Band Iris Coupled Cavity Bandpass Filter. The design approach followed includes the use of Chebychev low pass filter prototype elements to calculate normalized capacitance per unit length between resonators & ground and also between adjacent resonators. With the help of coupling and tuning screws, the bandwidth and center frequency of the filter can be tuned for desired performance. Coaxial capacitance formula is used to compute the diameter of the screws. CST tool is used to simulate & optimize the theoretically calculate physical dimensions to further improve the filter performance and obtain better tolerance sensitivity. Finally, a 6th order prototype is fabricated and tuned to obtain the desired performance. The cavity design & resonator calculations have been carried out in such a manner that the same hardware can be tuned to both the frequency bands i.e., 4.4-4.6 GHz (Band I) and 4.8-5.0 GHz (Band II) to meet the desired specifications. A prototype is fabricated and experimental validation is presented.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"19 34","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114044168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary Intelligent Reflecting Surfaces Assisted OFDM Systems 二元智能反射面辅助OFDM系统
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840791
L. Yashvanth, C. Murthy, B. Deepak
Intelligent reflecting surfaces (IRSs) enhance the performance of wireless systems by reflecting the incoming signals towards a desired user, especially in the mmWave bands. However, this requires optimizing the discrete reflection coefficients of the IRS elements, which crucially depends on the availability of accurate channel state information (CSI) of all links in the system. Further, in wideband systems employing orthogonal frequency division multiplexing (OFDM), a given IRS configuration cannot be simultaneously optimal for all the subcarriers, and hence the phase optimization is not straightforward. In this paper, we propose a novel IRS phase configuration scheme in OFDM systems by first leveraging the sparsity of the channel in the angular domain to estimate the CSI using simultaneous orthogonal matching pursuit (SOMP) algorithm, and then devising a novel and computationally efficient binary IRS phase configuration algorithm using majorization-minimization (MM). Simulation results illustrate the efficacy of the approach in comparison with the state-of-the-art.
智能反射面(IRSs)通过将输入信号反射到期望的用户来增强无线系统的性能,特别是在毫米波频段。然而,这需要优化IRS元件的离散反射系数,这关键取决于系统中所有链路的准确信道状态信息(CSI)的可用性。此外,在采用正交频分复用(OFDM)的宽带系统中,给定的IRS配置不能同时对所有子载波进行优化,因此相位优化并不直接。在本文中,我们提出了一种新的OFDM系统的IRS相位配置方案,首先利用信道在角域中的稀疏性,利用同步正交匹配追踪(SOMP)算法来估计CSI,然后设计了一种新的计算效率高的二元IRS相位配置算法,使用最大化-最小化(MM)。仿真结果表明了该方法与现有方法的有效性。
{"title":"Binary Intelligent Reflecting Surfaces Assisted OFDM Systems","authors":"L. Yashvanth, C. Murthy, B. Deepak","doi":"10.1109/SPCOM55316.2022.9840791","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840791","url":null,"abstract":"Intelligent reflecting surfaces (IRSs) enhance the performance of wireless systems by reflecting the incoming signals towards a desired user, especially in the mmWave bands. However, this requires optimizing the discrete reflection coefficients of the IRS elements, which crucially depends on the availability of accurate channel state information (CSI) of all links in the system. Further, in wideband systems employing orthogonal frequency division multiplexing (OFDM), a given IRS configuration cannot be simultaneously optimal for all the subcarriers, and hence the phase optimization is not straightforward. In this paper, we propose a novel IRS phase configuration scheme in OFDM systems by first leveraging the sparsity of the channel in the angular domain to estimate the CSI using simultaneous orthogonal matching pursuit (SOMP) algorithm, and then devising a novel and computationally efficient binary IRS phase configuration algorithm using majorization-minimization (MM). Simulation results illustrate the efficacy of the approach in comparison with the state-of-the-art.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified neural MRA architecture combining wavelet CNN and wavelet pooling for texture classification 结合小波CNN和小波池的纹理分类统一神经MRA架构
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840760
K. K. Tarafdar, Q. Saifee, V. Gadre
This paper introduces a novel unified neural Multi-Resolution Analysis (MRA) architecture that uses Discrete Wavelet Transform (DWT) integrated Convolutional Neural Network (CNN) along with DWT pooling. As convolution with pooling operation in CNN has equivalence with filtering and downsampling operation in a DWT filter bank, both are unified to form an end-to-end deep learning wavelet CNN model. The DWT pooling mechanism is also used to further enhance the MRA capability of this wavelet CNN. Using the first two wavelets of the Daubechies family, we present here a comprehensive set of improved texture classification results with several updates in the model architecture. These updates in the CNN model architecture apply to any node generally associated with the time-frequency analysis of the input signal.
本文介绍了一种新的统一神经多分辨率分析(MRA)架构,该架构采用离散小波变换(DWT)集成卷积神经网络(CNN)和DWT池。由于CNN中具有池化操作的卷积与DWT滤波器组中的滤波和下采样操作是等价的,因此将两者统一起来,形成端到端的深度学习小波CNN模型。利用DWT池化机制进一步增强了小波CNN的MRA能力。利用Daubechies家族的前两个小波,我们提出了一套全面改进的纹理分类结果,并在模型架构上进行了一些更新。CNN模型架构中的这些更新适用于通常与输入信号的时频分析相关的任何节点。
{"title":"A unified neural MRA architecture combining wavelet CNN and wavelet pooling for texture classification","authors":"K. K. Tarafdar, Q. Saifee, V. Gadre","doi":"10.1109/SPCOM55316.2022.9840760","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840760","url":null,"abstract":"This paper introduces a novel unified neural Multi-Resolution Analysis (MRA) architecture that uses Discrete Wavelet Transform (DWT) integrated Convolutional Neural Network (CNN) along with DWT pooling. As convolution with pooling operation in CNN has equivalence with filtering and downsampling operation in a DWT filter bank, both are unified to form an end-to-end deep learning wavelet CNN model. The DWT pooling mechanism is also used to further enhance the MRA capability of this wavelet CNN. Using the first two wavelets of the Daubechies family, we present here a comprehensive set of improved texture classification results with several updates in the model architecture. These updates in the CNN model architecture apply to any node generally associated with the time-frequency analysis of the input signal.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123038488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Signal Processing and Communications (SPCOM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1