Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980110
Yuanjie Deng, Ying Wei
Speech is more affected by the environment and noise, while the visual information corresponding to the speaker, such as lip movement and facial appearance are more robust. In this paper, a vision-guided speaker embedding based speech separation framework is proposed for the scenario of mixed speech separation. The speaker embedding is integrated on the basis of visual guidance. Specifically, we proposed two schemes to extract speaker embedding: using the clean additional speech of the speakers in a one-stage network, and using the separated speech at the first stage in a two-stage network. The two-stage scheme avoids the limitation of using clean additional speech. It utilizes gradually clean speech during the separation to extract the speech information, which is a continuous self-improvement process. Therefore, effective speaker embedding can be extracted even when only mixed speech is present. This is more practical in real-world scenarios. We conducted comparative experiments on the public dataset VoxCeleb2 and demonstrated the effectiveness of the proposed method.
{"title":"Vision-Guided Speaker Embedding Based Speech Separation","authors":"Yuanjie Deng, Ying Wei","doi":"10.1109/CISP-BMEI56279.2022.9980110","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980110","url":null,"abstract":"Speech is more affected by the environment and noise, while the visual information corresponding to the speaker, such as lip movement and facial appearance are more robust. In this paper, a vision-guided speaker embedding based speech separation framework is proposed for the scenario of mixed speech separation. The speaker embedding is integrated on the basis of visual guidance. Specifically, we proposed two schemes to extract speaker embedding: using the clean additional speech of the speakers in a one-stage network, and using the separated speech at the first stage in a two-stage network. The two-stage scheme avoids the limitation of using clean additional speech. It utilizes gradually clean speech during the separation to extract the speech information, which is a continuous self-improvement process. Therefore, effective speaker embedding can be extracted even when only mixed speech is present. This is more practical in real-world scenarios. We conducted comparative experiments on the public dataset VoxCeleb2 and demonstrated the effectiveness of the proposed method.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114186118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979818
Jiehang Deng, Yusheng Zheng, Wei Wang, Kunkun Xiong, Kun Zou
Recent studies have shown that the attention mechanism added to the deep convolutional neural network can effectively improve the network performance, but the attention mechanism applied to the field of violence detection has not been developed. The main reason is that violence detection uses 3D convolution network. At present, most attention modules are only suitable for 2D convolution, and these modules are designed as more complex modules to obtain better network performance, which inevitably increases the complexity of the network model. In order to overcome the trade-off between network performance and complexity, and explore the effectiveness and feasibility of attention mechanism in 3D convolutional network model, this paper proposes Lightweight Parallel 3D Attention Module (LP3DAM), which greatly improves the accuracy of the model by adding a small amount of parameters. Experiments show that LP3DAM has a positive effect on 3D lightweight convolutional networks, which makes the accuracy of the network (MiNet-3D) on the three datasets of Hockey, Crowd and RWF-2000 increase by 1.44%, 4.84% and 0.71%, respectively. The number of parameters added to the original network is controlled within 1K, and the increase of Flops is controlled at about 0.26M.
{"title":"LP3DAM: Lightweight Parallel 3D Attention Module for Violence Detection","authors":"Jiehang Deng, Yusheng Zheng, Wei Wang, Kunkun Xiong, Kun Zou","doi":"10.1109/CISP-BMEI56279.2022.9979818","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979818","url":null,"abstract":"Recent studies have shown that the attention mechanism added to the deep convolutional neural network can effectively improve the network performance, but the attention mechanism applied to the field of violence detection has not been developed. The main reason is that violence detection uses 3D convolution network. At present, most attention modules are only suitable for 2D convolution, and these modules are designed as more complex modules to obtain better network performance, which inevitably increases the complexity of the network model. In order to overcome the trade-off between network performance and complexity, and explore the effectiveness and feasibility of attention mechanism in 3D convolutional network model, this paper proposes Lightweight Parallel 3D Attention Module (LP3DAM), which greatly improves the accuracy of the model by adding a small amount of parameters. Experiments show that LP3DAM has a positive effect on 3D lightweight convolutional networks, which makes the accuracy of the network (MiNet-3D) on the three datasets of Hockey, Crowd and RWF-2000 increase by 1.44%, 4.84% and 0.71%, respectively. The number of parameters added to the original network is controlled within 1K, and the increase of Flops is controlled at about 0.26M.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121195279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979857
Zhaoxu Dong, Zhao Chen, Qian Wang
Retrosynthetic analysis is one of the most basic and commonly used methods for compound synthesis routes planning. In the process, the single-step synthesis prediction is the basis for predicting the synthesis route of the whole compound. With the wide application of computers in various disciplines, the use of computer-aided retrosynthetic process is becoming more and more common. The rise of artificial intelligence also makes more and more people apply pure data-driven deep learning models to retrosynthetic methods. At present, there are many deep learning-based methods to solve the problem of single-step retrosynthetic prediction. However, there is a lack of an end-to-end method using graph convolutional neural network for prediction. In this paper, we propose a template-based graph relation network for the prediction of single-step synthesis of compounds. The model can learn the coding of molecules and templates to predict whether there is a relationship between them. Therefore, the reactants of target molecules predicted by this model have great interpretability. In addition, in this experiment, we used a new dataset, which has a variety of reaction and template data, and further verified the practicability of the model.
{"title":"Retrosynthesis Prediction Based on Graph Relation Network","authors":"Zhaoxu Dong, Zhao Chen, Qian Wang","doi":"10.1109/CISP-BMEI56279.2022.9979857","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979857","url":null,"abstract":"Retrosynthetic analysis is one of the most basic and commonly used methods for compound synthesis routes planning. In the process, the single-step synthesis prediction is the basis for predicting the synthesis route of the whole compound. With the wide application of computers in various disciplines, the use of computer-aided retrosynthetic process is becoming more and more common. The rise of artificial intelligence also makes more and more people apply pure data-driven deep learning models to retrosynthetic methods. At present, there are many deep learning-based methods to solve the problem of single-step retrosynthetic prediction. However, there is a lack of an end-to-end method using graph convolutional neural network for prediction. In this paper, we propose a template-based graph relation network for the prediction of single-step synthesis of compounds. The model can learn the coding of molecules and templates to predict whether there is a relationship between them. Therefore, the reactants of target molecules predicted by this model have great interpretability. In addition, in this experiment, we used a new dataset, which has a variety of reaction and template data, and further verified the practicability of the model.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123871839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980122
Junjie An, Chaoqun Weng, Chenghua Wang, Zhihua Huang
Chronic disorders of consciousness (DOC) refers to brain damage caused by various reasons, resulting in the reduction or loss of patients' ability to perceive the stimuli from the environment and themselves. DOC includes vegetative state / unresponsive wakefulness syndrome (VS/UWS) and minimally conscious state (MCS). Many researchers have done a lot of research on the automatic classification of VS and MCS patients. In this study, we proposed an automatic state classification method based on machine learning. Firstly, the EEG signal is extracted by feature measurement methods such as time domain, frequency domain, time-frequency domain, and nonlinear analysis, and a total of 34 kinds of the abovementioned features are extracted. Then an eXtreme Gradient Boosting (XGBoost) classifier is established based on the extracted feature vectors and applied to the collected dataset for state classification. The data set in this paper uses the EEG data of 12 patients (including DOC and normal state) collected by Fujian Sanbo Funeng Brain Hospital for experiments to verify the feasibility and effectiveness of the proposed method. The experimental results show that the classification accuracy of the proposed method for VS, MCS, and Normal state patients is 99.91%.
慢性意识障碍(Chronic disorders of consciousness, DOC)是指由于各种原因引起的脑损伤,导致患者感知环境和自身刺激的能力降低或丧失。DOC包括植物人状态/无反应清醒综合征(VS/UWS)和最低意识状态(MCS)。许多研究者对VS和MCS患者的自动分类进行了大量的研究。在本研究中,我们提出了一种基于机器学习的自动状态分类方法。首先,通过时域、频域、时频域、非线性分析等特征测量方法提取脑电信号,共提取出34种以上特征;然后基于提取的特征向量建立极端梯度增强(XGBoost)分类器,并将其应用于收集到的数据集进行状态分类。本文的数据集采用福建省三博富能脑科医院采集的12例患者(包括DOC和正常状态)的脑电图数据进行实验,验证所提出方法的可行性和有效性。实验结果表明,该方法对VS、MCS和Normal患者的分类准确率为99.91%。
{"title":"Recognizing the consciousness states of DOC patients by classifying EEG signal","authors":"Junjie An, Chaoqun Weng, Chenghua Wang, Zhihua Huang","doi":"10.1109/CISP-BMEI56279.2022.9980122","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980122","url":null,"abstract":"Chronic disorders of consciousness (DOC) refers to brain damage caused by various reasons, resulting in the reduction or loss of patients' ability to perceive the stimuli from the environment and themselves. DOC includes vegetative state / unresponsive wakefulness syndrome (VS/UWS) and minimally conscious state (MCS). Many researchers have done a lot of research on the automatic classification of VS and MCS patients. In this study, we proposed an automatic state classification method based on machine learning. Firstly, the EEG signal is extracted by feature measurement methods such as time domain, frequency domain, time-frequency domain, and nonlinear analysis, and a total of 34 kinds of the abovementioned features are extracted. Then an eXtreme Gradient Boosting (XGBoost) classifier is established based on the extracted feature vectors and applied to the collected dataset for state classification. The data set in this paper uses the EEG data of 12 patients (including DOC and normal state) collected by Fujian Sanbo Funeng Brain Hospital for experiments to verify the feasibility and effectiveness of the proposed method. The experimental results show that the classification accuracy of the proposed method for VS, MCS, and Normal state patients is 99.91%.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127598074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980098
Ruoyun Li, Yuxi Zhang, Jinping Sun
Multi-sensor fusion recognition technology can make full use of the complementarity of information between sensors to reduce the influence of interference improves the success rate of target recognition, and has been widely used in the domain of radar target recognition. The multi-sensor fusion recognition methods that commonly used include Bayesian network, D-S evidence theory and so on, among which the Bayesian network has attracted extensive attention as not only it has a solid probability theory foundation but its structure and parameters can be learned. This paper proposes a fusion recognition method for active and passive radar target, the recognition results of active and passive radar targets are fused by the Bayesian network. The results show that the recognition success rate of using fusion recognition method based on Bayesian network is increased by 9.1%, 4.8% and 2.2% compared with that using recognition methods for only active radar target and only passive radar target and fusion recognition method based on D-S evidence theory, which proves the feasibility and effectiveness of the fusion recognition method based on Bayesian network.
{"title":"Active and Passive Radar Target Fusion Recognition Method Based on Bayesian Network","authors":"Ruoyun Li, Yuxi Zhang, Jinping Sun","doi":"10.1109/CISP-BMEI56279.2022.9980098","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980098","url":null,"abstract":"Multi-sensor fusion recognition technology can make full use of the complementarity of information between sensors to reduce the influence of interference improves the success rate of target recognition, and has been widely used in the domain of radar target recognition. The multi-sensor fusion recognition methods that commonly used include Bayesian network, D-S evidence theory and so on, among which the Bayesian network has attracted extensive attention as not only it has a solid probability theory foundation but its structure and parameters can be learned. This paper proposes a fusion recognition method for active and passive radar target, the recognition results of active and passive radar targets are fused by the Bayesian network. The results show that the recognition success rate of using fusion recognition method based on Bayesian network is increased by 9.1%, 4.8% and 2.2% compared with that using recognition methods for only active radar target and only passive radar target and fusion recognition method based on D-S evidence theory, which proves the feasibility and effectiveness of the fusion recognition method based on Bayesian network.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132908481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979905
Ruoyu Wu, Hong Huang
The accurate differential diagnosis of lung nodules is critical in the early screening of lung cancer. Although deep learning-based methods have obtained good results, the large variations in sizes and shapes of nodules restrict further performance improvement in automated diagnosis. In this paper, a multi-scale multi-view model based on ensemble attention (MSMV-EA) is proposed to discriminate the benign and malignant nodules on chest computed tomography (CT). First, the raw CT scans are aligned to a same resolution and a uniform intensity, and multiple sets of input patches with different scales are extracted from nine fixed view angles of each nodule volume. Then, a convolutional neural network (CNN)-based three-branch framework is constructed to fully learn the rich spatial structural information of nodule CT images, and more discriminative representations can be harvested in this way. Finally, an ensemble attention module is developed to adaptively aggregate multi-level deep features produced from different sub-networks, which can boost feature integration efficiency in an end-to-end trainable fashion. Experimental results on the public lung nodule CT image dataset LIDC-IDRI demonstrate that the proposed MSMV-EA method possesses the superior identification performance of benign-malignant nodules compared with some state-of-the-art (SOTA) approaches.
{"title":"Multi-Scale Multi-View Model Based on Ensemble Attention for Benign-Malignant Lung Nodule Classification on Chest CT","authors":"Ruoyu Wu, Hong Huang","doi":"10.1109/CISP-BMEI56279.2022.9979905","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979905","url":null,"abstract":"The accurate differential diagnosis of lung nodules is critical in the early screening of lung cancer. Although deep learning-based methods have obtained good results, the large variations in sizes and shapes of nodules restrict further performance improvement in automated diagnosis. In this paper, a multi-scale multi-view model based on ensemble attention (MSMV-EA) is proposed to discriminate the benign and malignant nodules on chest computed tomography (CT). First, the raw CT scans are aligned to a same resolution and a uniform intensity, and multiple sets of input patches with different scales are extracted from nine fixed view angles of each nodule volume. Then, a convolutional neural network (CNN)-based three-branch framework is constructed to fully learn the rich spatial structural information of nodule CT images, and more discriminative representations can be harvested in this way. Finally, an ensemble attention module is developed to adaptively aggregate multi-level deep features produced from different sub-networks, which can boost feature integration efficiency in an end-to-end trainable fashion. Experimental results on the public lung nodule CT image dataset LIDC-IDRI demonstrate that the proposed MSMV-EA method possesses the superior identification performance of benign-malignant nodules compared with some state-of-the-art (SOTA) approaches.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133333035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9980289
Hao-jun Sun, Zheng Zhang
This paper focuses on the severity detection of Parkinson's patients by analyzing their gait. In recent years, with the popularization of deep learning, gait detection technology has gradually matured. These techniques are increasingly used in medical diagnostics, such as Parkinson's severity detection. In recent years, Transformer models have been more and more widely and successfully used in the fields of natural language processing and image recognition. It illustrates that the Transformer-based model has a good ability for feature extraction. In this paper, we propose a Transformer-based model to detect the severity of Parkinson's symptoms. In the previous experiments, although the performance of the transformer is good, the disadvantage of its large memory footprint is also obvious. We improved our model to decouple temporal and spatial information extraction. This greatly increases the speed of the model. Concretely, we first obtained data consisting of 18 foot sensors from a public dataset, then preprocesses the input time series data, and adds unique temporal position coding to it. Second, feed them into 18 parallel temporal attention extraction modules and concatenate them together then input them into the dimensionality reduction layer for dimensionality reduction. Finally, they are input to the spatial attention extraction module and classified through the final linear layer. We applied and compared GLU (Gated Linear Unit), and GAU (Gated Attention Unit), which made our model better and faster. The experimental results show that using the public dataset provided by Physionet, the accuracy of the model reaches 97.4%, which is about 11.7% higher than the original model. The improved algorithm has high accuracy and practicability for Parkinson's gait analysis tasks and can better meet practical needs.
{"title":"Transformer-based severity detection of Parkinson's symptoms from gait","authors":"Hao-jun Sun, Zheng Zhang","doi":"10.1109/CISP-BMEI56279.2022.9980289","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9980289","url":null,"abstract":"This paper focuses on the severity detection of Parkinson's patients by analyzing their gait. In recent years, with the popularization of deep learning, gait detection technology has gradually matured. These techniques are increasingly used in medical diagnostics, such as Parkinson's severity detection. In recent years, Transformer models have been more and more widely and successfully used in the fields of natural language processing and image recognition. It illustrates that the Transformer-based model has a good ability for feature extraction. In this paper, we propose a Transformer-based model to detect the severity of Parkinson's symptoms. In the previous experiments, although the performance of the transformer is good, the disadvantage of its large memory footprint is also obvious. We improved our model to decouple temporal and spatial information extraction. This greatly increases the speed of the model. Concretely, we first obtained data consisting of 18 foot sensors from a public dataset, then preprocesses the input time series data, and adds unique temporal position coding to it. Second, feed them into 18 parallel temporal attention extraction modules and concatenate them together then input them into the dimensionality reduction layer for dimensionality reduction. Finally, they are input to the spatial attention extraction module and classified through the final linear layer. We applied and compared GLU (Gated Linear Unit), and GAU (Gated Attention Unit), which made our model better and faster. The experimental results show that using the public dataset provided by Physionet, the accuracy of the model reaches 97.4%, which is about 11.7% higher than the original model. The improved algorithm has high accuracy and practicability for Parkinson's gait analysis tasks and can better meet practical needs.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133759548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979872
Md. Raiyan Bin Mohsin, Sadia Afrin Ramisa, Mohammad Saad, Shahreen Husne Rabbani, Salwa Tamkin, Faisal Bin Ashraf, Md. Tanzim Reza
The fact that insecticidal pests impair significant agricultural productivity has become one of the main challenges in agriculture. Several prerequisites, however, exist for a high-performance automated system capable of detecting nuisance insects from massive amounts of visual data. We employed deep learning approaches to correctly identify insect species from large volumes of data in this study model and explainable AI to decide which part of the photos is used to categorize the insects from the data. We chose to deal with the large-scale IP102 dataset since we worked with a large dataset. There are almost 75,000 pictures in this collection, divided into 102 categories. We ran state-of-the-art tests on the unique IP102 data set to evaluate our proposed solution. We used five different Deep Neural Networks (DNN) models for image classification: VGG19, ResNet50, EfficientNetB5, DenseNet121, InceptionV3, and implemented the LIME-based XAI (Explainable Artificial Intelligence) framework. DenseNet121 outperformed all other networks, and we also implemented it to classify specific crop insect species. The classification accuracy ranged from 46.31 percent to 95.36 percent for eight crops. Moreover, we have compared our prediction to that of earlier articles to assess the efficacy of our research.
{"title":"Classifying Insect Pests from Image Data using Deep Learning","authors":"Md. Raiyan Bin Mohsin, Sadia Afrin Ramisa, Mohammad Saad, Shahreen Husne Rabbani, Salwa Tamkin, Faisal Bin Ashraf, Md. Tanzim Reza","doi":"10.1109/CISP-BMEI56279.2022.9979872","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979872","url":null,"abstract":"The fact that insecticidal pests impair significant agricultural productivity has become one of the main challenges in agriculture. Several prerequisites, however, exist for a high-performance automated system capable of detecting nuisance insects from massive amounts of visual data. We employed deep learning approaches to correctly identify insect species from large volumes of data in this study model and explainable AI to decide which part of the photos is used to categorize the insects from the data. We chose to deal with the large-scale IP102 dataset since we worked with a large dataset. There are almost 75,000 pictures in this collection, divided into 102 categories. We ran state-of-the-art tests on the unique IP102 data set to evaluate our proposed solution. We used five different Deep Neural Networks (DNN) models for image classification: VGG19, ResNet50, EfficientNetB5, DenseNet121, InceptionV3, and implemented the LIME-based XAI (Explainable Artificial Intelligence) framework. DenseNet121 outperformed all other networks, and we also implemented it to classify specific crop insect species. The classification accuracy ranged from 46.31 percent to 95.36 percent for eight crops. Moreover, we have compared our prediction to that of earlier articles to assess the efficacy of our research.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134138574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arrhythmia is one of the most common cardiovascular diseases. At present, most arrhythmias are classified by heartbeat. However, there are many problems with the use of heartbeat. For example, information such as incomplete compensatory interval after premature atrial beat cannot be used. There will also be a large error in the segmentation and interception of the heartbeat. It also wastes a lot of time while the program is running. However, research based on time window can effectively alleviate these problems. For wearable real-time ECG monitoring system, rapid, accurate and network lightweight design is the consensus of research. We propose a novel convolutional squeeze-and-excitation residual bidirectional GRU network (CSR-BiGRU) for arrhythmia time window. According to the characteristics of the ECG signal, the attention residual module (SERBlock) is fused into the CNN model, and BiGRU is combined to process the time information, which has achieved good results. Based on MIT-BIH arrhythmia database, the 10-fold cross validation was used to achieve 98.60% accuracy and 97.59% F1 score, which can accurately identify five types of common arrhythmias and has high detection performance, which can effectively make up for the shortage of heartbeat research.
{"title":"Arrhythmia Classification on Different Time Windows Using CSR-BiGRU Network","authors":"Yesong Liang, Liting Zhang, Xinge Jiang, Ying Wang, Rui Huo, Shoushui Wei","doi":"10.1109/CISP-BMEI56279.2022.9979856","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979856","url":null,"abstract":"Arrhythmia is one of the most common cardiovascular diseases. At present, most arrhythmias are classified by heartbeat. However, there are many problems with the use of heartbeat. For example, information such as incomplete compensatory interval after premature atrial beat cannot be used. There will also be a large error in the segmentation and interception of the heartbeat. It also wastes a lot of time while the program is running. However, research based on time window can effectively alleviate these problems. For wearable real-time ECG monitoring system, rapid, accurate and network lightweight design is the consensus of research. We propose a novel convolutional squeeze-and-excitation residual bidirectional GRU network (CSR-BiGRU) for arrhythmia time window. According to the characteristics of the ECG signal, the attention residual module (SERBlock) is fused into the CNN model, and BiGRU is combined to process the time information, which has achieved good results. Based on MIT-BIH arrhythmia database, the 10-fold cross validation was used to achieve 98.60% accuracy and 97.59% F1 score, which can accurately identify five types of common arrhythmias and has high detection performance, which can effectively make up for the shortage of heartbeat research.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133131877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.1109/CISP-BMEI56279.2022.9979903
Xiaoning Liu, Peiyao Guo, Jinhong Liu, Dongcheng Tuo, Shiyu Lei, Yuejin Wang
Facial attribute editing aims to change the facial attributes, which can be regarded as an image translation problem. Facial attribute editing is usually realized by combining encoder-decoder and Generative Adversarial Networks, but the generated image is not realistic enough, and the model has weak ability to control the fine granularity of face attributes of generated images. In this work, we propose a Generative Adversarial Network ISTSA-GAN based on Independent Selective Transfer Unit (ISTU) and Self-attention Mechanism. On the basis of STGAN, we use ISTU instead of Selective Transfer Unit (STU) to combine with encoder-decoder to selectively transfer the features of encoder. In addition, a self-attention mechanism is introduced into the transposed convolution layer of the decoder to establish long-distance dependence of the model across image regions. Finally, attribute interpolation loss and source domain adversarial loss are added to constrain the training of the model. Experimental results show that this method can improve the ability of editing attributes and saving much details, and enhance the ability of fine-grained control of editing attributes. It is superior to classical methods in attribute editing accuracy and image quality.
人脸属性编辑的目的是改变人脸属性,这可以看作是一个图像翻译问题。人脸属性编辑通常采用编码器-解码器和生成对抗网络相结合的方式来实现,但生成的图像不够逼真,模型对生成图像人脸属性细粒度的控制能力较弱。在这项工作中,我们提出了一个基于独立选择转移单元(ISTU)和自注意机制的生成式对抗网络ISTSA-GAN。在STGAN的基础上,我们用ISTU代替选择性传输单元(Selective Transfer Unit, STU)与编解码器结合,选择性地传输编码器的特征。此外,在解码器的转置卷积层中引入自关注机制,建立模型跨图像区域的远距离依赖关系。最后,加入属性插值损失和源域对抗损失来约束模型的训练。实验结果表明,该方法提高了编辑属性和保存大量细节的能力,增强了编辑属性的细粒度控制能力。该方法在属性编辑精度和图像质量方面优于经典方法。
{"title":"Facial Attribute Editing based on Independent Selective Transfer Unit and Self-attention Mechanism","authors":"Xiaoning Liu, Peiyao Guo, Jinhong Liu, Dongcheng Tuo, Shiyu Lei, Yuejin Wang","doi":"10.1109/CISP-BMEI56279.2022.9979903","DOIUrl":"https://doi.org/10.1109/CISP-BMEI56279.2022.9979903","url":null,"abstract":"Facial attribute editing aims to change the facial attributes, which can be regarded as an image translation problem. Facial attribute editing is usually realized by combining encoder-decoder and Generative Adversarial Networks, but the generated image is not realistic enough, and the model has weak ability to control the fine granularity of face attributes of generated images. In this work, we propose a Generative Adversarial Network ISTSA-GAN based on Independent Selective Transfer Unit (ISTU) and Self-attention Mechanism. On the basis of STGAN, we use ISTU instead of Selective Transfer Unit (STU) to combine with encoder-decoder to selectively transfer the features of encoder. In addition, a self-attention mechanism is introduced into the transposed convolution layer of the decoder to establish long-distance dependence of the model across image regions. Finally, attribute interpolation loss and source domain adversarial loss are added to constrain the training of the model. Experimental results show that this method can improve the ability of editing attributes and saving much details, and enhance the ability of fine-grained control of editing attributes. It is superior to classical methods in attribute editing accuracy and image quality.","PeriodicalId":198522,"journal":{"name":"2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115609885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}