首页 > 最新文献

2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)最新文献

英文 中文
Vision-Guided Speaker Embedding Based Speech Separation 基于视觉引导的说话人嵌入语音分离
Yuanjie Deng, Ying Wei
Speech is more affected by the environment and noise, while the visual information corresponding to the speaker, such as lip movement and facial appearance are more robust. In this paper, a vision-guided speaker embedding based speech separation framework is proposed for the scenario of mixed speech separation. The speaker embedding is integrated on the basis of visual guidance. Specifically, we proposed two schemes to extract speaker embedding: using the clean additional speech of the speakers in a one-stage network, and using the separated speech at the first stage in a two-stage network. The two-stage scheme avoids the limitation of using clean additional speech. It utilizes gradually clean speech during the separation to extract the speech information, which is a continuous self-improvement process. Therefore, effective speaker embedding can be extracted even when only mixed speech is present. This is more practical in real-world scenarios. We conducted comparative experiments on the public dataset VoxCeleb2 and demonstrated the effectiveness of the proposed method.
语音受环境和噪声的影响更大,而与说话者相对应的视觉信息,如唇动、面部表情等则更为稳健。针对混合语音分离场景,提出了一种基于视觉引导的说话人嵌入语音分离框架。说话人嵌入是在视觉引导的基础上集成的。具体来说,我们提出了两种提取说话人嵌入的方案:在单级网络中使用说话人的干净的附加语音,在两级网络中使用第一阶段的分离语音。两阶段方案避免了使用干净附加语的限制。它利用分离过程中逐渐清洁的语音提取语音信息,这是一个不断自我完善的过程。因此,即使只有混合语音存在,也可以提取有效的说话人嵌入。这在实际场景中更为实用。我们在公共数据集VoxCeleb2上进行了对比实验,验证了所提方法的有效性。
{"title":"Vision-Guided Speaker Embedding Based Speech Separation","authors":"Yuanjie Deng, Ying Wei","doi":"10.1109/CISP-BMEI56279.2022.9980110","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980110","url":null,"abstract":"Speech is more affected by the environment and noise, while the visual information corresponding to the speaker, such as lip movement and facial appearance are more robust. In this paper, a vision-guided speaker embedding based speech separation framework is proposed for the scenario of mixed speech separation. The speaker embedding is integrated on the basis of visual guidance. Specifically, we proposed two schemes to extract speaker embedding: using the clean additional speech of the speakers in a one-stage network, and using the separated speech at the first stage in a two-stage network. The two-stage scheme avoids the limitation of using clean additional speech. It utilizes gradually clean speech during the separation to extract the speech information, which is a continuous self-improvement process. Therefore, effective speaker embedding can be extracted even when only mixed speech is present. This is more practical in real-world scenarios. We conducted comparative experiments on the public dataset VoxCeleb2 and demonstrated the effectiveness of the proposed method.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114186118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
LP3DAM: Lightweight Parallel 3D Attention Module for Violence Detection LP3DAM:用于暴力检测的轻量级并行3D注意力模块
Jiehang Deng, Yusheng Zheng, Wei Wang, Kunkun Xiong, Kun Zou
Recent studies have shown that the attention mechanism added to the deep convolutional neural network can effectively improve the network performance, but the attention mechanism applied to the field of violence detection has not been developed. The main reason is that violence detection uses 3D convolution network. At present, most attention modules are only suitable for 2D convolution, and these modules are designed as more complex modules to obtain better network performance, which inevitably increases the complexity of the network model. In order to overcome the trade-off between network performance and complexity, and explore the effectiveness and feasibility of attention mechanism in 3D convolutional network model, this paper proposes Lightweight Parallel 3D Attention Module (LP3DAM), which greatly improves the accuracy of the model by adding a small amount of parameters. Experiments show that LP3DAM has a positive effect on 3D lightweight convolutional networks, which makes the accuracy of the network (MiNet-3D) on the three datasets of Hockey, Crowd and RWF-2000 increase by 1.44%, 4.84% and 0.71%, respectively. The number of parameters added to the original network is controlled within 1K, and the increase of Flops is controlled at about 0.26M.
近年来的研究表明,在深度卷积神经网络中加入注意机制可以有效地提高网络性能,但将注意机制应用于暴力检测领域的研究尚未得到发展。主要原因是暴力检测使用三维卷积网络。目前,大多数注意力模块只适用于二维卷积,为了获得更好的网络性能,这些模块被设计成更复杂的模块,这不可避免地增加了网络模型的复杂性。为了克服网络性能与复杂度之间的权衡,探索三维卷积网络模型中注意机制的有效性和可行性,本文提出了轻量级并行三维注意模块(LP3DAM),通过添加少量参数,极大地提高了模型的准确性。实验表明,LP3DAM对3D轻量级卷积网络有积极的影响,使得网络(MiNet-3D)在Hockey、Crowd和RWF-2000三个数据集上的准确率分别提高了1.44%、4.84%和0.71%。在原有网络中加入的参数数量控制在1K以内,Flops的增加控制在0.26M左右。
{"title":"LP3DAM: Lightweight Parallel 3D Attention Module for Violence Detection","authors":"Jiehang Deng, Yusheng Zheng, Wei Wang, Kunkun Xiong, Kun Zou","doi":"10.1109/CISP-BMEI56279.2022.9979818","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979818","url":null,"abstract":"Recent studies have shown that the attention mechanism added to the deep convolutional neural network can effectively improve the network performance, but the attention mechanism applied to the field of violence detection has not been developed. The main reason is that violence detection uses 3D convolution network. At present, most attention modules are only suitable for 2D convolution, and these modules are designed as more complex modules to obtain better network performance, which inevitably increases the complexity of the network model. In order to overcome the trade-off between network performance and complexity, and explore the effectiveness and feasibility of attention mechanism in 3D convolutional network model, this paper proposes Lightweight Parallel 3D Attention Module (LP3DAM), which greatly improves the accuracy of the model by adding a small amount of parameters. Experiments show that LP3DAM has a positive effect on 3D lightweight convolutional networks, which makes the accuracy of the network (MiNet-3D) on the three datasets of Hockey, Crowd and RWF-2000 increase by 1.44%, 4.84% and 0.71%, respectively. The number of parameters added to the original network is controlled within 1K, and the increase of Flops is controlled at about 0.26M.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121195279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrosynthesis Prediction Based on Graph Relation Network 基于图关系网络的逆合成预测
Zhaoxu Dong, Zhao Chen, Qian Wang
Retrosynthetic analysis is one of the most basic and commonly used methods for compound synthesis routes planning. In the process, the single-step synthesis prediction is the basis for predicting the synthesis route of the whole compound. With the wide application of computers in various disciplines, the use of computer-aided retrosynthetic process is becoming more and more common. The rise of artificial intelligence also makes more and more people apply pure data-driven deep learning models to retrosynthetic methods. At present, there are many deep learning-based methods to solve the problem of single-step retrosynthetic prediction. However, there is a lack of an end-to-end method using graph convolutional neural network for prediction. In this paper, we propose a template-based graph relation network for the prediction of single-step synthesis of compounds. The model can learn the coding of molecules and templates to predict whether there is a relationship between them. Therefore, the reactants of target molecules predicted by this model have great interpretability. In addition, in this experiment, we used a new dataset, which has a variety of reaction and template data, and further verified the practicability of the model.
反合成分析是化合物合成路线规划中最基本、最常用的方法之一。在此过程中,单步合成预测是预测整个化合物合成路线的基础。随着计算机在各个学科中的广泛应用,计算机辅助合成工艺的应用也越来越普遍。人工智能的兴起,也让越来越多的人将纯数据驱动的深度学习模型应用于逆合成方法。目前,有许多基于深度学习的方法来解决单步反合成预测问题。然而,目前还缺乏一种使用图卷积神经网络进行预测的端到端方法。本文提出了一种基于模板的图关系网络,用于预测化合物的一步合成。该模型可以学习分子和模板的编码,预测它们之间是否存在关系。因此,该模型预测的目标分子的反应物具有很强的可解释性。此外,在本次实验中,我们使用了一个新的数据集,其中包含了多种反应和模板数据,进一步验证了模型的实用性。
{"title":"Retrosynthesis Prediction Based on Graph Relation Network","authors":"Zhaoxu Dong, Zhao Chen, Qian Wang","doi":"10.1109/CISP-BMEI56279.2022.9979857","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979857","url":null,"abstract":"Retrosynthetic analysis is one of the most basic and commonly used methods for compound synthesis routes planning. In the process, the single-step synthesis prediction is the basis for predicting the synthesis route of the whole compound. With the wide application of computers in various disciplines, the use of computer-aided retrosynthetic process is becoming more and more common. The rise of artificial intelligence also makes more and more people apply pure data-driven deep learning models to retrosynthetic methods. At present, there are many deep learning-based methods to solve the problem of single-step retrosynthetic prediction. However, there is a lack of an end-to-end method using graph convolutional neural network for prediction. In this paper, we propose a template-based graph relation network for the prediction of single-step synthesis of compounds. The model can learn the coding of molecules and templates to predict whether there is a relationship between them. Therefore, the reactants of target molecules predicted by this model have great interpretability. In addition, in this experiment, we used a new dataset, which has a variety of reaction and template data, and further verified the practicability of the model.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123871839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognizing the consciousness states of DOC patients by classifying EEG signal 通过脑电图信号分类识别DOC患者的意识状态
Junjie An, Chaoqun Weng, Chenghua Wang, Zhihua Huang
Chronic disorders of consciousness (DOC) refers to brain damage caused by various reasons, resulting in the reduction or loss of patients' ability to perceive the stimuli from the environment and themselves. DOC includes vegetative state / unresponsive wakefulness syndrome (VS/UWS) and minimally conscious state (MCS). Many researchers have done a lot of research on the automatic classification of VS and MCS patients. In this study, we proposed an automatic state classification method based on machine learning. Firstly, the EEG signal is extracted by feature measurement methods such as time domain, frequency domain, time-frequency domain, and nonlinear analysis, and a total of 34 kinds of the abovementioned features are extracted. Then an eXtreme Gradient Boosting (XGBoost) classifier is established based on the extracted feature vectors and applied to the collected dataset for state classification. The data set in this paper uses the EEG data of 12 patients (including DOC and normal state) collected by Fujian Sanbo Funeng Brain Hospital for experiments to verify the feasibility and effectiveness of the proposed method. The experimental results show that the classification accuracy of the proposed method for VS, MCS, and Normal state patients is 99.91%.
慢性意识障碍(Chronic disorders of consciousness, DOC)是指由于各种原因引起的脑损伤,导致患者感知环境和自身刺激的能力降低或丧失。DOC包括植物人状态/无反应清醒综合征(VS/UWS)和最低意识状态(MCS)。许多研究者对VS和MCS患者的自动分类进行了大量的研究。在本研究中,我们提出了一种基于机器学习的自动状态分类方法。首先,通过时域、频域、时频域、非线性分析等特征测量方法提取脑电信号,共提取出34种以上特征;然后基于提取的特征向量建立极端梯度增强(XGBoost)分类器,并将其应用于收集到的数据集进行状态分类。本文的数据集采用福建省三博富能脑科医院采集的12例患者(包括DOC和正常状态)的脑电图数据进行实验,验证所提出方法的可行性和有效性。实验结果表明,该方法对VS、MCS和Normal患者的分类准确率为99.91%。
{"title":"Recognizing the consciousness states of DOC patients by classifying EEG signal","authors":"Junjie An, Chaoqun Weng, Chenghua Wang, Zhihua Huang","doi":"10.1109/CISP-BMEI56279.2022.9980122","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980122","url":null,"abstract":"Chronic disorders of consciousness (DOC) refers to brain damage caused by various reasons, resulting in the reduction or loss of patients' ability to perceive the stimuli from the environment and themselves. DOC includes vegetative state / unresponsive wakefulness syndrome (VS/UWS) and minimally conscious state (MCS). Many researchers have done a lot of research on the automatic classification of VS and MCS patients. In this study, we proposed an automatic state classification method based on machine learning. Firstly, the EEG signal is extracted by feature measurement methods such as time domain, frequency domain, time-frequency domain, and nonlinear analysis, and a total of 34 kinds of the abovementioned features are extracted. Then an eXtreme Gradient Boosting (XGBoost) classifier is established based on the extracted feature vectors and applied to the collected dataset for state classification. The data set in this paper uses the EEG data of 12 patients (including DOC and normal state) collected by Fujian Sanbo Funeng Brain Hospital for experiments to verify the feasibility and effectiveness of the proposed method. The experimental results show that the classification accuracy of the proposed method for VS, MCS, and Normal state patients is 99.91%.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127598074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active and Passive Radar Target Fusion Recognition Method Based on Bayesian Network 基于贝叶斯网络的主被动雷达目标融合识别方法
Ruoyun Li, Yuxi Zhang, Jinping Sun
Multi-sensor fusion recognition technology can make full use of the complementarity of information between sensors to reduce the influence of interference improves the success rate of target recognition, and has been widely used in the domain of radar target recognition. The multi-sensor fusion recognition methods that commonly used include Bayesian network, D-S evidence theory and so on, among which the Bayesian network has attracted extensive attention as not only it has a solid probability theory foundation but its structure and parameters can be learned. This paper proposes a fusion recognition method for active and passive radar target, the recognition results of active and passive radar targets are fused by the Bayesian network. The results show that the recognition success rate of using fusion recognition method based on Bayesian network is increased by 9.1%, 4.8% and 2.2% compared with that using recognition methods for only active radar target and only passive radar target and fusion recognition method based on D-S evidence theory, which proves the feasibility and effectiveness of the fusion recognition method based on Bayesian network.
多传感器融合识别技术可以充分利用传感器间信息的互补性,减少干扰的影响,提高目标识别的成功率,在雷达目标识别领域得到了广泛的应用。目前常用的多传感器融合识别方法有贝叶斯网络、D-S证据理论等,其中贝叶斯网络不仅具有扎实的概率论基础,而且其结构和参数是可以学习的,因此受到了广泛的关注。提出了一种主动和被动雷达目标的融合识别方法,利用贝叶斯网络对主动和被动雷达目标的识别结果进行融合。结果表明,与只识别主动雷达目标和只识别被动雷达目标的方法以及基于D-S证据理论的融合识别方法相比,基于贝叶斯网络的融合识别方法的识别成功率分别提高了9.1%、4.8%和2.2%,证明了基于贝叶斯网络的融合识别方法的可行性和有效性。
{"title":"Active and Passive Radar Target Fusion Recognition Method Based on Bayesian Network","authors":"Ruoyun Li, Yuxi Zhang, Jinping Sun","doi":"10.1109/CISP-BMEI56279.2022.9980098","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980098","url":null,"abstract":"Multi-sensor fusion recognition technology can make full use of the complementarity of information between sensors to reduce the influence of interference improves the success rate of target recognition, and has been widely used in the domain of radar target recognition. The multi-sensor fusion recognition methods that commonly used include Bayesian network, D-S evidence theory and so on, among which the Bayesian network has attracted extensive attention as not only it has a solid probability theory foundation but its structure and parameters can be learned. This paper proposes a fusion recognition method for active and passive radar target, the recognition results of active and passive radar targets are fused by the Bayesian network. The results show that the recognition success rate of using fusion recognition method based on Bayesian network is increased by 9.1%, 4.8% and 2.2% compared with that using recognition methods for only active radar target and only passive radar target and fusion recognition method based on D-S evidence theory, which proves the feasibility and effectiveness of the fusion recognition method based on Bayesian network.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132908481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Multi-View Model Based on Ensemble Attention for Benign-Malignant Lung Nodule Classification on Chest CT 基于集合关注的多尺度多视图模型在胸部CT良恶性肺结节分类中的应用
Ruoyu Wu, Hong Huang
The accurate differential diagnosis of lung nodules is critical in the early screening of lung cancer. Although deep learning-based methods have obtained good results, the large variations in sizes and shapes of nodules restrict further performance improvement in automated diagnosis. In this paper, a multi-scale multi-view model based on ensemble attention (MSMV-EA) is proposed to discriminate the benign and malignant nodules on chest computed tomography (CT). First, the raw CT scans are aligned to a same resolution and a uniform intensity, and multiple sets of input patches with different scales are extracted from nine fixed view angles of each nodule volume. Then, a convolutional neural network (CNN)-based three-branch framework is constructed to fully learn the rich spatial structural information of nodule CT images, and more discriminative representations can be harvested in this way. Finally, an ensemble attention module is developed to adaptively aggregate multi-level deep features produced from different sub-networks, which can boost feature integration efficiency in an end-to-end trainable fashion. Experimental results on the public lung nodule CT image dataset LIDC-IDRI demonstrate that the proposed MSMV-EA method possesses the superior identification performance of benign-malignant nodules compared with some state-of-the-art (SOTA) approaches.
肺结节的准确鉴别诊断是肺癌早期筛查的关键。尽管基于深度学习的方法取得了良好的效果,但结节大小和形状的巨大变化限制了自动诊断的进一步性能提高。本文提出了一种基于集合注意的多尺度多视图模型(MSMV-EA)来鉴别胸部CT上的良、恶性结节。首先,将原始CT扫描以相同分辨率和均匀强度对齐,从每个结节体的9个固定视角提取多组不同尺度的输入斑块。然后,构建基于卷积神经网络(CNN)的三分支框架,充分学习结节CT图像丰富的空间结构信息,收获更多的判别表征。最后,开发了一个集成关注模块,用于自适应聚合不同子网络产生的多层次深度特征,以端到端可训练的方式提高特征集成效率。在公共肺结节CT图像数据集LIDC-IDRI上的实验结果表明,与现有的SOTA方法相比,所提出的MSMV-EA方法具有更好的良恶性结节识别性能。
{"title":"Multi-Scale Multi-View Model Based on Ensemble Attention for Benign-Malignant Lung Nodule Classification on Chest CT","authors":"Ruoyu Wu, Hong Huang","doi":"10.1109/CISP-BMEI56279.2022.9979905","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979905","url":null,"abstract":"The accurate differential diagnosis of lung nodules is critical in the early screening of lung cancer. Although deep learning-based methods have obtained good results, the large variations in sizes and shapes of nodules restrict further performance improvement in automated diagnosis. In this paper, a multi-scale multi-view model based on ensemble attention (MSMV-EA) is proposed to discriminate the benign and malignant nodules on chest computed tomography (CT). First, the raw CT scans are aligned to a same resolution and a uniform intensity, and multiple sets of input patches with different scales are extracted from nine fixed view angles of each nodule volume. Then, a convolutional neural network (CNN)-based three-branch framework is constructed to fully learn the rich spatial structural information of nodule CT images, and more discriminative representations can be harvested in this way. Finally, an ensemble attention module is developed to adaptively aggregate multi-level deep features produced from different sub-networks, which can boost feature integration efficiency in an end-to-end trainable fashion. Experimental results on the public lung nodule CT image dataset LIDC-IDRI demonstrate that the proposed MSMV-EA method possesses the superior identification performance of benign-malignant nodules compared with some state-of-the-art (SOTA) approaches.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133333035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Transformer-based severity detection of Parkinson's symptoms from gait 基于变压器的步态帕金森症状严重程度检测
Hao-jun Sun, Zheng Zhang
This paper focuses on the severity detection of Parkinson's patients by analyzing their gait. In recent years, with the popularization of deep learning, gait detection technology has gradually matured. These techniques are increasingly used in medical diagnostics, such as Parkinson's severity detection. In recent years, Transformer models have been more and more widely and successfully used in the fields of natural language processing and image recognition. It illustrates that the Transformer-based model has a good ability for feature extraction. In this paper, we propose a Transformer-based model to detect the severity of Parkinson's symptoms. In the previous experiments, although the performance of the transformer is good, the disadvantage of its large memory footprint is also obvious. We improved our model to decouple temporal and spatial information extraction. This greatly increases the speed of the model. Concretely, we first obtained data consisting of 18 foot sensors from a public dataset, then preprocesses the input time series data, and adds unique temporal position coding to it. Second, feed them into 18 parallel temporal attention extraction modules and concatenate them together then input them into the dimensionality reduction layer for dimensionality reduction. Finally, they are input to the spatial attention extraction module and classified through the final linear layer. We applied and compared GLU (Gated Linear Unit), and GAU (Gated Attention Unit), which made our model better and faster. The experimental results show that using the public dataset provided by Physionet, the accuracy of the model reaches 97.4%, which is about 11.7% higher than the original model. The improved algorithm has high accuracy and practicability for Parkinson's gait analysis tasks and can better meet practical needs.
本文的重点是通过分析帕金森患者的步态来检测其严重程度。近年来,随着深度学习的普及,步态检测技术逐渐成熟。这些技术越来越多地用于医学诊断,如帕金森病的严重程度检测。近年来,Transformer模型在自然语言处理和图像识别领域得到了越来越广泛和成功的应用。结果表明,基于transformer的模型具有较好的特征提取能力。在本文中,我们提出了一个基于变压器的模型来检测帕金森症状的严重程度。在之前的实验中,虽然变压器的性能很好,但其内存占用大的缺点也很明显。我们改进了我们的模型来解耦时空信息提取。这大大提高了模型的速度。具体而言,我们首先从公共数据集中获得由18个英尺传感器组成的数据,然后对输入的时间序列数据进行预处理,并对其添加独特的时间位置编码。其次,将它们输入到18个并行的时间注意力提取模块中,并将它们串联在一起,然后输入到降维层中进行降维。最后,它们被输入到空间注意力提取模块,并通过最后的线性层进行分类。我们应用并比较了GLU(门控线性单元)和GAU(门控注意力单元),使我们的模型更好更快。实验结果表明,使用Physionet提供的公共数据集,模型的准确率达到97.4%,比原模型提高了11.7%左右。改进后的算法对帕金森步态分析任务具有较高的准确性和实用性,能更好地满足实际需要。
{"title":"Transformer-based severity detection of Parkinson's symptoms from gait","authors":"Hao-jun Sun, Zheng Zhang","doi":"10.1109/CISP-BMEI56279.2022.9980289","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980289","url":null,"abstract":"This paper focuses on the severity detection of Parkinson's patients by analyzing their gait. In recent years, with the popularization of deep learning, gait detection technology has gradually matured. These techniques are increasingly used in medical diagnostics, such as Parkinson's severity detection. In recent years, Transformer models have been more and more widely and successfully used in the fields of natural language processing and image recognition. It illustrates that the Transformer-based model has a good ability for feature extraction. In this paper, we propose a Transformer-based model to detect the severity of Parkinson's symptoms. In the previous experiments, although the performance of the transformer is good, the disadvantage of its large memory footprint is also obvious. We improved our model to decouple temporal and spatial information extraction. This greatly increases the speed of the model. Concretely, we first obtained data consisting of 18 foot sensors from a public dataset, then preprocesses the input time series data, and adds unique temporal position coding to it. Second, feed them into 18 parallel temporal attention extraction modules and concatenate them together then input them into the dimensionality reduction layer for dimensionality reduction. Finally, they are input to the spatial attention extraction module and classified through the final linear layer. We applied and compared GLU (Gated Linear Unit), and GAU (Gated Attention Unit), which made our model better and faster. The experimental results show that using the public dataset provided by Physionet, the accuracy of the model reaches 97.4%, which is about 11.7% higher than the original model. The improved algorithm has high accuracy and practicability for Parkinson's gait analysis tasks and can better meet practical needs.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133759548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying Insect Pests from Image Data using Deep Learning 利用深度学习对图像数据中的害虫进行分类
Md. Raiyan Bin Mohsin, Sadia Afrin Ramisa, Mohammad Saad, Shahreen Husne Rabbani, Salwa Tamkin, Faisal Bin Ashraf, Md. Tanzim Reza
The fact that insecticidal pests impair significant agricultural productivity has become one of the main challenges in agriculture. Several prerequisites, however, exist for a high-performance automated system capable of detecting nuisance insects from massive amounts of visual data. We employed deep learning approaches to correctly identify insect species from large volumes of data in this study model and explainable AI to decide which part of the photos is used to categorize the insects from the data. We chose to deal with the large-scale IP102 dataset since we worked with a large dataset. There are almost 75,000 pictures in this collection, divided into 102 categories. We ran state-of-the-art tests on the unique IP102 data set to evaluate our proposed solution. We used five different Deep Neural Networks (DNN) models for image classification: VGG19, ResNet50, EfficientNetB5, DenseNet121, InceptionV3, and implemented the LIME-based XAI (Explainable Artificial Intelligence) framework. DenseNet121 outperformed all other networks, and we also implemented it to classify specific crop insect species. The classification accuracy ranged from 46.31 percent to 95.36 percent for eight crops. Moreover, we have compared our prediction to that of earlier articles to assess the efficacy of our research.
杀虫害虫严重危害农业生产力,已成为农业面临的主要挑战之一。然而,一个高性能的自动化系统要能够从大量的视觉数据中检测出讨厌的昆虫,需要具备几个先决条件。在本研究模型中,我们使用深度学习方法从大量数据中正确识别昆虫种类,并使用可解释的AI来决定使用照片的哪一部分对数据中的昆虫进行分类。我们选择处理大规模的IP102数据集,因为我们使用的是一个大数据集。这个收藏中有近75000张照片,分为102个类别。我们对独特的IP102数据集进行了最先进的测试,以评估我们提出的解决方案。我们使用了五种不同的深度神经网络(DNN)模型进行图像分类:VGG19、ResNet50、EfficientNetB5、DenseNet121、InceptionV3,并实现了基于lime的XAI(可解释人工智能)框架。DenseNet121的表现优于所有其他网络,我们还将其用于对特定作物昆虫物种进行分类。8种作物的分类准确率在46.31% ~ 95.36%之间。此外,我们将我们的预测与早期文章的预测进行了比较,以评估我们研究的有效性。
{"title":"Classifying Insect Pests from Image Data using Deep Learning","authors":"Md. Raiyan Bin Mohsin, Sadia Afrin Ramisa, Mohammad Saad, Shahreen Husne Rabbani, Salwa Tamkin, Faisal Bin Ashraf, Md. Tanzim Reza","doi":"10.1109/CISP-BMEI56279.2022.9979872","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979872","url":null,"abstract":"The fact that insecticidal pests impair significant agricultural productivity has become one of the main challenges in agriculture. Several prerequisites, however, exist for a high-performance automated system capable of detecting nuisance insects from massive amounts of visual data. We employed deep learning approaches to correctly identify insect species from large volumes of data in this study model and explainable AI to decide which part of the photos is used to categorize the insects from the data. We chose to deal with the large-scale IP102 dataset since we worked with a large dataset. There are almost 75,000 pictures in this collection, divided into 102 categories. We ran state-of-the-art tests on the unique IP102 data set to evaluate our proposed solution. We used five different Deep Neural Networks (DNN) models for image classification: VGG19, ResNet50, EfficientNetB5, DenseNet121, InceptionV3, and implemented the LIME-based XAI (Explainable Artificial Intelligence) framework. DenseNet121 outperformed all other networks, and we also implemented it to classify specific crop insect species. The classification accuracy ranged from 46.31 percent to 95.36 percent for eight crops. Moreover, we have compared our prediction to that of earlier articles to assess the efficacy of our research.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134138574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Arrhythmia Classification on Different Time Windows Using CSR-BiGRU Network 基于CSR-BiGRU网络的不同时间窗心律失常分类
Yesong Liang, Liting Zhang, Xinge Jiang, Ying Wang, Rui Huo, Shoushui Wei
Arrhythmia is one of the most common cardiovascular diseases. At present, most arrhythmias are classified by heartbeat. However, there are many problems with the use of heartbeat. For example, information such as incomplete compensatory interval after premature atrial beat cannot be used. There will also be a large error in the segmentation and interception of the heartbeat. It also wastes a lot of time while the program is running. However, research based on time window can effectively alleviate these problems. For wearable real-time ECG monitoring system, rapid, accurate and network lightweight design is the consensus of research. We propose a novel convolutional squeeze-and-excitation residual bidirectional GRU network (CSR-BiGRU) for arrhythmia time window. According to the characteristics of the ECG signal, the attention residual module (SERBlock) is fused into the CNN model, and BiGRU is combined to process the time information, which has achieved good results. Based on MIT-BIH arrhythmia database, the 10-fold cross validation was used to achieve 98.60% accuracy and 97.59% F1 score, which can accurately identify five types of common arrhythmias and has high detection performance, which can effectively make up for the shortage of heartbeat research.
心律失常是最常见的心血管疾病之一。目前,大多数心律失常是根据心跳来分类的。然而,心跳的使用存在许多问题。例如,不能使用心房早搏后不完全代偿期等信息。对心跳的分割和截取也会有较大的误差。它也浪费了大量的时间,而程序正在运行。而基于时间窗的研究可以有效地缓解这些问题。对于可穿戴式实时心电监护系统,快速、准确、网络轻量化的设计是研究的共识。针对心律失常时间窗,提出了一种新颖的卷积压缩激励剩余双向GRU网络(CSR-BiGRU)。根据心电信号的特点,将注意残差模块(SERBlock)融合到CNN模型中,结合BiGRU对时间信息进行处理,取得了较好的效果。基于MIT-BIH心律失常数据库,采用10倍交叉验证,准确率达到98.60%,F1评分达到97.59%,能够准确识别5种常见心律失常,检测性能较高,可有效弥补心跳研究的不足。
{"title":"Arrhythmia Classification on Different Time Windows Using CSR-BiGRU Network","authors":"Yesong Liang, Liting Zhang, Xinge Jiang, Ying Wang, Rui Huo, Shoushui Wei","doi":"10.1109/CISP-BMEI56279.2022.9979856","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979856","url":null,"abstract":"Arrhythmia is one of the most common cardiovascular diseases. At present, most arrhythmias are classified by heartbeat. However, there are many problems with the use of heartbeat. For example, information such as incomplete compensatory interval after premature atrial beat cannot be used. There will also be a large error in the segmentation and interception of the heartbeat. It also wastes a lot of time while the program is running. However, research based on time window can effectively alleviate these problems. For wearable real-time ECG monitoring system, rapid, accurate and network lightweight design is the consensus of research. We propose a novel convolutional squeeze-and-excitation residual bidirectional GRU network (CSR-BiGRU) for arrhythmia time window. According to the characteristics of the ECG signal, the attention residual module (SERBlock) is fused into the CNN model, and BiGRU is combined to process the time information, which has achieved good results. Based on MIT-BIH arrhythmia database, the 10-fold cross validation was used to achieve 98.60% accuracy and 97.59% F1 score, which can accurately identify five types of common arrhythmias and has high detection performance, which can effectively make up for the shortage of heartbeat research.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133131877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Attribute Editing based on Independent Selective Transfer Unit and Self-attention Mechanism 基于独立选择传递单元和自注意机制的面部属性编辑
Xiaoning Liu, Peiyao Guo, Jinhong Liu, Dongcheng Tuo, Shiyu Lei, Yuejin Wang
Facial attribute editing aims to change the facial attributes, which can be regarded as an image translation problem. Facial attribute editing is usually realized by combining encoder-decoder and Generative Adversarial Networks, but the generated image is not realistic enough, and the model has weak ability to control the fine granularity of face attributes of generated images. In this work, we propose a Generative Adversarial Network ISTSA-GAN based on Independent Selective Transfer Unit (ISTU) and Self-attention Mechanism. On the basis of STGAN, we use ISTU instead of Selective Transfer Unit (STU) to combine with encoder-decoder to selectively transfer the features of encoder. In addition, a self-attention mechanism is introduced into the transposed convolution layer of the decoder to establish long-distance dependence of the model across image regions. Finally, attribute interpolation loss and source domain adversarial loss are added to constrain the training of the model. Experimental results show that this method can improve the ability of editing attributes and saving much details, and enhance the ability of fine-grained control of editing attributes. It is superior to classical methods in attribute editing accuracy and image quality.
人脸属性编辑的目的是改变人脸属性,这可以看作是一个图像翻译问题。人脸属性编辑通常采用编码器-解码器和生成对抗网络相结合的方式来实现,但生成的图像不够逼真,模型对生成图像人脸属性细粒度的控制能力较弱。在这项工作中,我们提出了一个基于独立选择转移单元(ISTU)和自注意机制的生成式对抗网络ISTSA-GAN。在STGAN的基础上,我们用ISTU代替选择性传输单元(Selective Transfer Unit, STU)与编解码器结合,选择性地传输编码器的特征。此外,在解码器的转置卷积层中引入自关注机制,建立模型跨图像区域的远距离依赖关系。最后,加入属性插值损失和源域对抗损失来约束模型的训练。实验结果表明,该方法提高了编辑属性和保存大量细节的能力,增强了编辑属性的细粒度控制能力。该方法在属性编辑精度和图像质量方面优于经典方法。
{"title":"Facial Attribute Editing based on Independent Selective Transfer Unit and Self-attention Mechanism","authors":"Xiaoning Liu, Peiyao Guo, Jinhong Liu, Dongcheng Tuo, Shiyu Lei, Yuejin Wang","doi":"10.1109/CISP-BMEI56279.2022.9979903","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979903","url":null,"abstract":"Facial attribute editing aims to change the facial attributes, which can be regarded as an image translation problem. Facial attribute editing is usually realized by combining encoder-decoder and Generative Adversarial Networks, but the generated image is not realistic enough, and the model has weak ability to control the fine granularity of face attributes of generated images. In this work, we propose a Generative Adversarial Network ISTSA-GAN based on Independent Selective Transfer Unit (ISTU) and Self-attention Mechanism. On the basis of STGAN, we use ISTU instead of Selective Transfer Unit (STU) to combine with encoder-decoder to selectively transfer the features of encoder. In addition, a self-attention mechanism is introduced into the transposed convolution layer of the decoder to establish long-distance dependence of the model across image regions. Finally, attribute interpolation loss and source domain adversarial loss are added to constrain the training of the model. Experimental results show that this method can improve the ability of editing attributes and saving much details, and enhance the ability of fine-grained control of editing attributes. It is superior to classical methods in attribute editing accuracy and image quality.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115609885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1