首页 > 最新文献

2022 IEEE International Conference on Signal Processing and Communications (SPCOM)最新文献

英文 中文
Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty 基于认知不确定性的食品图像分层和平面分类器集成
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840761
Vishwesh Pillai, Pranav Mehar, M. Das, Deep Gupta, P. Radeva
The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.
食品图像识别问题在当今的背景下是一个至关重要的问题,因为糖尿病、肥胖和心脏病等健康状况需要不断监测一个人的饮食。为了使这一过程自动化,有几种模型可用于识别食物图像。由于大量独特的食物菜肴和各种菜系,传统的平面分类器不再表现良好。为了解决这一问题,使用了由平面分类器和层次分类器组成的预测方案,并通过对认知不确定性的分析在分类器之间进行切换。然而,使用认知不确定性数据进行预测的准确性仍然相当低。因此,本文提出了一种使用三种不同阈值准则的预测方案,有助于提高认知不确定性预测的准确性。在maood -121数据集上进行了多次实验,验证了该方法的性能。实验结果验证了该方法的性能,并表明所提出的阈值准则通过正确分类样本的不确定性分布,有助于提高预测的整体精度。
{"title":"Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty","authors":"Vishwesh Pillai, Pranav Mehar, M. Das, Deep Gupta, P. Radeva","doi":"10.1109/SPCOM55316.2022.9840761","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840761","url":null,"abstract":"The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigating Synchronized Optical Ballistocardiography vs Electrocardiography for Pathological and Healthy Adults 病理和健康成人同步光学超声心动图与心电图的对比研究
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840778
Prachee Priyadarshinee, Yixian Tan, Cindy Ming Ying Lin, Christopher Johann Clarke, T. BalamuraliB., Enyi Tan, V. Tan, S. Chai, C. Yeo, Jer-Ming Chen
We investigated the relationship between optical BCG and ECG signals measured simultaneously for the same heartbeat cycles. Despite the long history of BCG, earlier studies compared BCG and ECG features across large time cycles (inter-heartbeat), but not within the heartbeat cycle (intra-heartbeat). The non-invasively derived BCG signal was found to have a remarkable relationship with the arterial pressure signal, which has not been previously reported. We achieved synchronization of the two disparate modalities to within an estimated uncertainty of 50-70 ms, which allowed us to compare features within the heart cycle (which may be related to the arterial pressure) for one pathological and four healthy subjects lying supine, and found it consistent regardless of their breathing condition, gender and health status. Although not a one-to-one correlation, we show optical BCG proves to be a convenient and an unobtrusive and complementary modality to monitor cardiac activity alongside the well-established ECG.
我们研究了光学BCG与同一心跳周期同时测量的心电信号之间的关系。尽管BCG的历史悠久,但早期的研究比较了BCG和ECG在大时间周期内(心跳间)的特征,而不是在心跳周期内(心跳内)的特征。发现无创衍生BCG信号与动脉压信号有显著的关系,这在以前没有报道。我们在估计的50-70毫秒的不确定性范围内实现了两种不同模式的同步,这使我们能够比较一名病理和四名健康受试者仰卧时心脏周期的特征(可能与动脉压有关),并发现无论他们的呼吸状况、性别和健康状况如何,它都是一致的。虽然不是一对一的相关性,但我们显示光学卡介苗被证明是一种方便、不显眼的补充方式,可以与完善的ECG一起监测心脏活动。
{"title":"Investigating Synchronized Optical Ballistocardiography vs Electrocardiography for Pathological and Healthy Adults","authors":"Prachee Priyadarshinee, Yixian Tan, Cindy Ming Ying Lin, Christopher Johann Clarke, T. BalamuraliB., Enyi Tan, V. Tan, S. Chai, C. Yeo, Jer-Ming Chen","doi":"10.1109/SPCOM55316.2022.9840778","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840778","url":null,"abstract":"We investigated the relationship between optical BCG and ECG signals measured simultaneously for the same heartbeat cycles. Despite the long history of BCG, earlier studies compared BCG and ECG features across large time cycles (inter-heartbeat), but not within the heartbeat cycle (intra-heartbeat). The non-invasively derived BCG signal was found to have a remarkable relationship with the arterial pressure signal, which has not been previously reported. We achieved synchronization of the two disparate modalities to within an estimated uncertainty of 50-70 ms, which allowed us to compare features within the heart cycle (which may be related to the arterial pressure) for one pathological and four healthy subjects lying supine, and found it consistent regardless of their breathing condition, gender and health status. Although not a one-to-one correlation, we show optical BCG proves to be a convenient and an unobtrusive and complementary modality to monitor cardiac activity alongside the well-established ECG.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126870212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Framework to Automatic Extract EOG Artifacts from Single Channel EEG Recordings 一种从单通道EEG记录中自动提取EEG伪影的有效框架
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840849
Murali Krishna Yadavalli, V. K. Pamula
In health care applications portable electroencephalogram (EEG) systems are frequently used to record and process the brain signals due to easy of use and low cost. Electrooculogram (EOG) is the major high amplitude low frequency artifact eye blink signal, which misleads the diagnosis activity of decease. Hence there is demand for artifact remove techniques in portable single EEG devices. In this work presented automatic extraction of EOG artifact by integrating Fluctuation based Dispersion Entropy (FDispEn) with Singular Spectral Analysis (SSA) and Adaptive noise canceller(ANC). The proposed model successfully identifies artifact signal component based on entropy values at different SNR and remove it with ANC for better performance. This method avoid the dependency on threshold to identify artifact subspace unlike previous existed DWT,SSA and Adaptive SSA methods combined with ANC. Proposed method is evaluated on synthetic data and real EEG data set and eliminate eyeblink artifact by preserving the low frequency EEG content. The performance of proposed method shows superiority in performance metrics over existing algorithms.
在医疗保健应用中,便携式脑电图(EEG)系统因其易于使用和成本低而经常被用于记录和处理脑信号。眼电图(EOG)是主要的高振幅低频伪眨眼信号,误导了疾病的诊断活动。因此,对便携式单台脑电图设备的伪影去除技术提出了需求。本文提出了基于波动的色散熵(FDispEn)与奇异谱分析(SSA)和自适应消噪(ANC)相结合的EOG伪影自动提取方法。该模型基于不同信噪比下的熵值成功地识别出伪信号成分,并使用ANC去除伪信号以获得更好的性能。该方法不像以往的DWT、SSA和自适应SSA结合ANC的方法那样依赖于阈值来识别工件子空间。该方法对合成数据和真实脑电数据集进行评估,通过保留低频脑电内容来消除眨眼伪影。该方法在性能指标上优于现有算法。
{"title":"An Efficient Framework to Automatic Extract EOG Artifacts from Single Channel EEG Recordings","authors":"Murali Krishna Yadavalli, V. K. Pamula","doi":"10.1109/SPCOM55316.2022.9840849","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840849","url":null,"abstract":"In health care applications portable electroencephalogram (EEG) systems are frequently used to record and process the brain signals due to easy of use and low cost. Electrooculogram (EOG) is the major high amplitude low frequency artifact eye blink signal, which misleads the diagnosis activity of decease. Hence there is demand for artifact remove techniques in portable single EEG devices. In this work presented automatic extraction of EOG artifact by integrating Fluctuation based Dispersion Entropy (FDispEn) with Singular Spectral Analysis (SSA) and Adaptive noise canceller(ANC). The proposed model successfully identifies artifact signal component based on entropy values at different SNR and remove it with ANC for better performance. This method avoid the dependency on threshold to identify artifact subspace unlike previous existed DWT,SSA and Adaptive SSA methods combined with ANC. Proposed method is evaluated on synthetic data and real EEG data set and eliminate eyeblink artifact by preserving the low frequency EEG content. The performance of proposed method shows superiority in performance metrics over existing algorithms.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126111397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sensitivity Analysis of MaskCycleGAN based Voice Conversion for Enhancing Cleft Lip and Palate Speech Recognition 基于MaskCycleGAN的语音转换增强唇腭裂语音识别的灵敏度分析
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840769
S. Bhattacharjee, R. Sinha
Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. The existing work on CLP speech enhancement is by using CycleGAN-VC based non-parallel voice conversion method. However, CycleGAN-VC cannot capture the time-frequency structures which can be done by MaskCycleGAN-VC by application of a module named as time-frequency adaptive normalization. It also has the added advantage of mel-spectrogram conversion rather than mel-spectrum conversion. This voice conversion of a CLP speech to a normal speech increases the intelligibility and thereby allows automatic speech recognition systems to predict the uttered sentences which is necessary in day to day life as speech recognition devices are automatizing living on a large scale. But in order to develop an assistive technology it is very essential to study the sensitivity of automatic speech recognizers. This work focuses on the sensitivity analysis of a MaskCycleGAN based voice conversion system depending on the variation of acoustic and gender mismatch.
唇腭裂是一种先天性的言语畸形。因此,他们的语音不适合语音识别系统。现有的CLP语音增强工作是采用基于CycleGAN-VC的非并行语音转换方法。然而,CycleGAN-VC无法捕获MaskCycleGAN-VC可以通过应用时频自适应归一化模块实现的时频结构。它还具有梅尔谱图转换而不是梅尔谱转换的额外优点。将CLP语音转换为正常语音增加了可理解性,从而允许自动语音识别系统预测发出的句子,这在日常生活中是必要的,因为语音识别设备正在大规模自动化生活。但是为了开发一种辅助技术,对自动语音识别器的灵敏度进行研究是非常必要的。本研究的重点是基于MaskCycleGAN的语音转换系统在声学和性别不匹配变化下的灵敏度分析。
{"title":"Sensitivity Analysis of MaskCycleGAN based Voice Conversion for Enhancing Cleft Lip and Palate Speech Recognition","authors":"S. Bhattacharjee, R. Sinha","doi":"10.1109/SPCOM55316.2022.9840769","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840769","url":null,"abstract":"Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. The existing work on CLP speech enhancement is by using CycleGAN-VC based non-parallel voice conversion method. However, CycleGAN-VC cannot capture the time-frequency structures which can be done by MaskCycleGAN-VC by application of a module named as time-frequency adaptive normalization. It also has the added advantage of mel-spectrogram conversion rather than mel-spectrum conversion. This voice conversion of a CLP speech to a normal speech increases the intelligibility and thereby allows automatic speech recognition systems to predict the uttered sentences which is necessary in day to day life as speech recognition devices are automatizing living on a large scale. But in order to develop an assistive technology it is very essential to study the sensitivity of automatic speech recognizers. This work focuses on the sensitivity analysis of a MaskCycleGAN based voice conversion system depending on the variation of acoustic and gender mismatch.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132709447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smart Beam Steering with a Slot-Loaded Miniaturized Patch Antenna 带槽加载的小型化贴片天线的智能波束转向
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840797
Paramita Saha, J. Das, P. Venkateswaran
In this paper, a smart beam steering technique is investigated with a miniaturized patch antenna at 2.45 GHz. The antenna is miniaturized with an asymmetrically placed slot on the patch that reduces the frequency of operation significantly. It radiates with $23 %$ area $(0.15 lambda X 0.11 lambda, lambda$ is the freespace wavelength at 2.45 GHz) of a conventional patch at the operating frequency with the same dielectric properties. A $1 times 8$ antenna array is also designed with a power divider and phase shifter circuit to demonstrate a beam steering operation. With numerical investigation, a maximum of 220 steering angle is observed in the azimuthal plane.
本文研究了一种基于2.45 GHz微型化贴片天线的智能波束引导技术。天线是小型化的,在贴片上有一个不对称的插槽,大大降低了操作频率。在具有相同介电特性的工作频率下,它的辐射面积为传统贴片的23% (0.15 lambda X 0.11 lambda, lambda$为2.45 GHz的自由空间波长)。1 × 8天线阵列还设计了一个功率分配器和移相器电路,以演示波束转向操作。通过数值研究,在方位面上观察到的最大转向角为220。
{"title":"Smart Beam Steering with a Slot-Loaded Miniaturized Patch Antenna","authors":"Paramita Saha, J. Das, P. Venkateswaran","doi":"10.1109/SPCOM55316.2022.9840797","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840797","url":null,"abstract":"In this paper, a smart beam steering technique is investigated with a miniaturized patch antenna at 2.45 GHz. The antenna is miniaturized with an asymmetrically placed slot on the patch that reduces the frequency of operation significantly. It radiates with $23 %$ area $(0.15 lambda X 0.11 lambda, lambda$ is the freespace wavelength at 2.45 GHz) of a conventional patch at the operating frequency with the same dielectric properties. A $1 times 8$ antenna array is also designed with a power divider and phase shifter circuit to demonstrate a beam steering operation. With numerical investigation, a maximum of 220 steering angle is observed in the azimuthal plane.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133299825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Video-Based Breath Signal For Detection of Out-of-breath Speech 基于视频的呼出信号提取及呼出语音检测
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840788
Sibasis Sahoo, S. Dandapat
A cost-effective video signal based breath signal extraction method is described in this work. It does not require any sophisticated instrument; instead uses devices like mobile phones, headphones and computers that are readily available to an individual. For the same, a new database is created having read-speech utterances and video signals under the neutral and the post-exercise (or known as out-of-breath) conditions. The breath signals for most of the speakers exhibit a higher strength for both inhalation and exhalation phases of the breathing cycle under out-of-breath conditions. Additionally, the average duration of the breath cycle decreases for the same condition. The exhalation phase mainly influences the above time reduction. The ability of the breath features for distinguishing the neutral and the out-of-breath class is verified by the support vector machine and the logistic regression classifiers. The performance of both the classifiers in terms of unweighted average recall and Fl-score improved to $approx$ 70% after combining the above breath features with the MFCC baseline features.
本文提出了一种基于视频信号的低成本呼吸信号提取方法。它不需要任何复杂的仪器;取而代之的是使用手机、耳机和电脑等个人随时可用的设备。同样,在中性和运动后(或称为上气不接下气)条件下,创建一个新的数据库,其中包含读-说的话语和视频信号。在出气条件下,大多数说话者的呼吸信号为呼吸循环的吸入和呼出阶段都表现出更高的强度。此外,在相同的条件下,呼吸周期的平均持续时间缩短。呼气阶段主要影响上述时间减少。通过支持向量机和逻辑回归分类器验证了呼吸特征区分中性和喘不过气的能力。将上述呼吸特征与MFCC基线特征相结合后,两种分类器在未加权平均召回率和fl分数方面的性能均提高到约70%。
{"title":"Extracting Video-Based Breath Signal For Detection of Out-of-breath Speech","authors":"Sibasis Sahoo, S. Dandapat","doi":"10.1109/SPCOM55316.2022.9840788","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840788","url":null,"abstract":"A cost-effective video signal based breath signal extraction method is described in this work. It does not require any sophisticated instrument; instead uses devices like mobile phones, headphones and computers that are readily available to an individual. For the same, a new database is created having read-speech utterances and video signals under the neutral and the post-exercise (or known as out-of-breath) conditions. The breath signals for most of the speakers exhibit a higher strength for both inhalation and exhalation phases of the breathing cycle under out-of-breath conditions. Additionally, the average duration of the breath cycle decreases for the same condition. The exhalation phase mainly influences the above time reduction. The ability of the breath features for distinguishing the neutral and the out-of-breath class is verified by the support vector machine and the logistic regression classifiers. The performance of both the classifiers in terms of unweighted average recall and Fl-score improved to $approx$ 70% after combining the above breath features with the MFCC baseline features.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134397552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patch Level Segmentation and Visualization of Capsule Network Inference for Breast Metastases Detection 用于乳腺癌转移检测的贴片水平分割和胶囊网络推断可视化
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840781
Malviya Dutta Richa, Sk. Arif Ahmed, D. P. Dogra, P. Dan
Capsule Networks are becoming popular for developing AI-guided medical diagnostic tools. The objective of this paper is to carve out a strategy to solve dual problems of classification and segmentation of metastatic tissue regions in one single pipeline. To accomplish this, an attempt has been made in this paper to utilize capsule networks with variational Baye’s routing to classify normal and metastatic tissue regions from breast cancer whole slide images. Thereafter, a high-level segmentation of the metastatic tissue region has been carried out using the classified patches. The results obtained with a set of 75,000 patches show that patch-level segmentation is an efficient method to delineate metastatic regions. In the prospect of end-users, visualization of results plays a significant role in selecting the appropriate method for their applications. Capsule networks mimic the way the human brain works. For long, it has been the demand from clinicians that the algorithms used for the automatic classification of cancer pathology should be interpretable. Thus, in clinical practice, such a method will be more acceptable. The efficient region segmentation would aid clinicians in readily demarcating the area of interest and the area of most relevance.
胶囊网络在开发人工智能引导的医疗诊断工具方面越来越受欢迎。本文的目的是开拓出一种策略,以解决双重问题的分类和转移组织区域的分割在一个单一的管道。为了实现这一点,本文尝试利用胶囊网络与变分贝叶斯路由从乳腺癌整张幻灯片图像中对正常和转移组织区域进行分类。此后,使用分类补丁对转移组织区域进行了高水平分割。结果表明,斑块水平分割是一种有效的转移区域划分方法。在最终用户的前景中,结果的可视化在为其应用选择合适的方法方面起着重要作用。胶囊网络模仿人类大脑的工作方式。长期以来,临床医生一直要求用于癌症病理自动分类的算法应该是可解释的。因此,在临床实践中,这种方法将更容易被接受。有效的区域分割将有助于临床医生容易地划定感兴趣的区域和最相关的区域。
{"title":"Patch Level Segmentation and Visualization of Capsule Network Inference for Breast Metastases Detection","authors":"Malviya Dutta Richa, Sk. Arif Ahmed, D. P. Dogra, P. Dan","doi":"10.1109/SPCOM55316.2022.9840781","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840781","url":null,"abstract":"Capsule Networks are becoming popular for developing AI-guided medical diagnostic tools. The objective of this paper is to carve out a strategy to solve dual problems of classification and segmentation of metastatic tissue regions in one single pipeline. To accomplish this, an attempt has been made in this paper to utilize capsule networks with variational Baye’s routing to classify normal and metastatic tissue regions from breast cancer whole slide images. Thereafter, a high-level segmentation of the metastatic tissue region has been carried out using the classified patches. The results obtained with a set of 75,000 patches show that patch-level segmentation is an efficient method to delineate metastatic regions. In the prospect of end-users, visualization of results plays a significant role in selecting the appropriate method for their applications. Capsule networks mimic the way the human brain works. For long, it has been the demand from clinicians that the algorithms used for the automatic classification of cancer pathology should be interpretable. Thus, in clinical practice, such a method will be more acceptable. The efficient region segmentation would aid clinicians in readily demarcating the area of interest and the area of most relevance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134398337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Device Localization Under α-KMS Fading Environment using Feedback Distance based Gradient Ascent 基于反馈距离梯度上升的α-KMS衰落环境下智能设备定位
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840756
Aditya Sing, Ankur Pandey, Sudhir Kumar
In this paper, we propose a novel method for location estimation of smart devices considering a generic shadowed $alpha-kappa-mu$ distribution based $alpha$-KMS fading environment, which is not considered for localization hitherto. Most of the existing path loss-based methods utilize a standard log-normal model only for localization; however, fading effects need to be considered to appropriately model the Received Signal Strength (RSS) values. Some of the localization methods utilize standard fading models such as Rayleigh, Nakagami-m, and Rician, to name a few; however, such assumptions lead to erroneous location estimates. The generic location estimator is applicable for all environments and provides accurate location estimates with correct estimates of $alpha-kappa-mu$. We propose a feedback-induced gradient ascent algorithm based on feedback distance that maximizes the derived log-likelihood estimate of the actual location. The proposed method also addresses the non-convex nature of the maximum likelihood estimator and is computationally efficient. The performance is evaluated on a simulated testbed, and the localization results outperform existing state-of-the-art methods.
本文提出了一种新的智能设备位置估计方法,该方法考虑了基于$alpha$ -KMS衰落环境的通用阴影$alpha-kappa-mu$分布,这是迄今为止尚未考虑的定位方法。大多数现有的基于路径损失的方法仅在定位时使用标准对数正态模型;然而,需要考虑衰落效应,以适当地模拟接收信号强度(RSS)值。一些定位方法利用标准的衰落模型,如Rayleigh、Nakagami-m和医师等;然而,这样的假设会导致错误的位置估计。通用位置估计器适用于所有环境,并通过$alpha-kappa-mu$的正确估计提供准确的位置估计。我们提出了一种基于反馈距离的反馈诱导梯度上升算法,该算法最大限度地提高了实际位置的对数似然估计。该方法还解决了极大似然估计的非凸性质,并且计算效率高。在模拟试验台上对其性能进行了评估,结果表明定位结果优于现有的最先进的方法。
{"title":"Smart Device Localization Under α-KMS Fading Environment using Feedback Distance based Gradient Ascent","authors":"Aditya Sing, Ankur Pandey, Sudhir Kumar","doi":"10.1109/SPCOM55316.2022.9840756","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840756","url":null,"abstract":"In this paper, we propose a novel method for location estimation of smart devices considering a generic shadowed $alpha-kappa-mu$ distribution based $alpha$-KMS fading environment, which is not considered for localization hitherto. Most of the existing path loss-based methods utilize a standard log-normal model only for localization; however, fading effects need to be considered to appropriately model the Received Signal Strength (RSS) values. Some of the localization methods utilize standard fading models such as Rayleigh, Nakagami-m, and Rician, to name a few; however, such assumptions lead to erroneous location estimates. The generic location estimator is applicable for all environments and provides accurate location estimates with correct estimates of $alpha-kappa-mu$. We propose a feedback-induced gradient ascent algorithm based on feedback distance that maximizes the derived log-likelihood estimate of the actual location. The proposed method also addresses the non-convex nature of the maximum likelihood estimator and is computationally efficient. The performance is evaluated on a simulated testbed, and the localization results outperform existing state-of-the-art methods.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134075695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario 语码转换双语情境下的次话语层语言识别问题
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840813
Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna
Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.
语次层次语言识别是在语次层次上对语码转换语进行自动识别的过程。CS话语的性质表明,第一语言的出现时间比第二语言长得多。在CS话语中,一个人同时说两种语言。因此,第二语言的音素水平声学特征(亚音段和音段证据)大多偏向于母语。这假设使用CS训练数据的基于声学的语言识别系统可能会以偏向主要语言的表现结束。本研究通过观察先前提出的方法在混淆矩阵方面的性能来证明这一假设。与此同时,语言歧视也可以在超分段水平上进行,通过捕捉语言特定的音位时间证据。因此,为了解决偏置问题,本研究提出了一种基于wav2vec2的方法,该方法在预训练阶段捕获超分段音位时间模式,并在微调阶段将其合并以捕获特定语言的超分段证据。实验结果表明,该方法在一定程度上解决了这一问题。由于微调阶段使用了判别方法,因此可以在未来探索加权损失和辅助语言增强方法以进一步提高性能。索引术语:代码切换(CS)双语语音,次话语级语言识别(slide), wav2vec2, Deepspeech2。
{"title":"Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario","authors":"Jagabandhu Mishra, Joshitha Gandra, Vaishnavi Patil, S. Prasanna","doi":"10.1109/SPCOM55316.2022.9840813","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840813","url":null,"abstract":"Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. Index Terms: Code switched (CS) bilingual speech, Sub-utterance level language identification (SLID), wav2vec2, Deepspeech2.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117255470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
EHTNet: Twin-pooled CNN with Empirical Mode Decomposition and Hilbert Spectrum for Acoustic Scene Classification 基于经验模态分解和Hilbert谱的双池CNN声学场景分类
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840514
Aswathy Madhu, K. Suresh
The objective of Acoustic Scene Classification (ASC) is to assist the machines in identifying the unique acoustic characteristics that define an environment. In recent times, Convolutional Neural Networks (CNNs) have contributed significantly to the success of many state-of-the-art frameworks for ASC. The overall accuracy of the ASC framework depends on two factors: the signal representation and the learning model. In this work, we address these two factors as follows. First, we propose a time-frequency representation that employs empirical mode decomposition and Hilbert spectrum for meaningful characterization of the acoustic signal. Second, we introduce EHTNet, a framework for ASC which utilizes twin-pooled CNNs for classification and the proposed time-frequency representation to characterize the acoustic signal. Experiments on a benchmark dataset in ASC indicate that EHTNet outperforms state-of-the-art approaches for ASC in addition to a log mel spectrum-based baseline. Specifically, the proposed framework improves the classification accuracy by 91.04% and the f1-score by 93.61% as against the baseline.
声学场景分类(ASC)的目标是帮助机器识别定义环境的独特声学特征。近年来,卷积神经网络(cnn)为许多最先进的ASC框架的成功做出了重大贡献。ASC框架的整体准确性取决于两个因素:信号表示和学习模型。在这项工作中,我们解决这两个因素如下。首先,我们提出了一种采用经验模态分解和希尔伯特谱的时频表示来对声信号进行有意义的表征。其次,我们介绍了EHTNet,一个ASC框架,它利用双池cnn进行分类和提出的时频表示来表征声信号。在ASC的基准数据集上进行的实验表明,除了基于对数谱的基线之外,EHTNet还优于ASC的最先进方法。具体而言,与基线相比,该框架的分类准确率提高了91.04%,f1得分提高了93.61%。
{"title":"EHTNet: Twin-pooled CNN with Empirical Mode Decomposition and Hilbert Spectrum for Acoustic Scene Classification","authors":"Aswathy Madhu, K. Suresh","doi":"10.1109/SPCOM55316.2022.9840514","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840514","url":null,"abstract":"The objective of Acoustic Scene Classification (ASC) is to assist the machines in identifying the unique acoustic characteristics that define an environment. In recent times, Convolutional Neural Networks (CNNs) have contributed significantly to the success of many state-of-the-art frameworks for ASC. The overall accuracy of the ASC framework depends on two factors: the signal representation and the learning model. In this work, we address these two factors as follows. First, we propose a time-frequency representation that employs empirical mode decomposition and Hilbert spectrum for meaningful characterization of the acoustic signal. Second, we introduce EHTNet, a framework for ASC which utilizes twin-pooled CNNs for classification and the proposed time-frequency representation to characterize the acoustic signal. Experiments on a benchmark dataset in ASC indicate that EHTNet outperforms state-of-the-art approaches for ASC in addition to a log mel spectrum-based baseline. Specifically, the proposed framework improves the classification accuracy by 91.04% and the f1-score by 93.61% as against the baseline.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"33 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120895373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Signal Processing and Communications (SPCOM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1