Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition最新文献_第3页

RGFGM-LXMERT-An Improve Architecture Based On LXMERT rgfgm -LXMERT-基于LXMERT的改进架构

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581879

Renjie Yu

LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a two-stream cross-modality pre-trained model that performs well in different downstream tasks which contain two visual question answering datasets and a challenging visual-reasoning task (i.e., VQA, GQA, and NLVR). But the large-scale model still has a lot of room for progress. That is, the model accuracy is very low, the generalization ability is weak, and it is easy to be attacked by adversarial attacks. Furthermore, training the LXMERT model takes a lot of time and money, so there is an urgent need to improve. Thus, I try to improve the training speed, generalization ability, and accuracy of the model by enhancing both the training method and the model structure. In the training method, FGM (Fast Gradient Method) adversarial training is introduced in the finetune phase of the model by adding the disturbances in both the language embedding layer's and visual feature linear layer's weights, which effectively improves the model accuracy and generalization ability. In the model structure, a residual block with weight is used to improve the training speed by 1.6% in the pre-training phase of this model without losing the model performance. Next, t the most important structure, the Encoder, is redesigned to make the model more convergent. The Encoder's FFN (Feed-Forward Neural Network) is replaced by GLU (Gated Linear Unit), which also improves the ability of model fitting and model performance. The improved model performs better on the VQA task than the benchmark (i.e., LXMERT). In the end, detailed ablation studies prove that my enhancement strategies are effective for LXMERT and observe the effectiveness of different measures on the model.

LXMERT (Learning Cross-Modality Encoder Representations from Transformers)是一种双流跨模态预训练模型，在包含两个视觉问答数据集和一个具有挑战性的视觉推理任务(即VQA、GQA和NLVR)的不同下游任务中表现良好。但大规模模型仍有很大的进步空间。即模型精度很低，泛化能力较弱，容易受到对抗性攻击。此外，训练LXMERT模型需要花费大量的时间和金钱，因此迫切需要改进。因此，我试图通过改进训练方法和模型结构来提高模型的训练速度、泛化能力和准确性。在训练方法中，通过在语言嵌入层和视觉特征线性层的权值中加入干扰，在模型的微调阶段引入FGM (Fast Gradient method)对抗训练，有效提高了模型的准确率和泛化能力。在模型结构中，在不损失模型性能的前提下，在模型的预训练阶段，使用带有权重的残差块将训练速度提高1.6%。接下来，重新设计最重要的结构——编码器，使模型更加收敛。将编码器的前馈神经网络(FFN)替换为门控线性单元(GLU)，提高了模型拟合能力和模型性能。改进的模型在VQA任务上比基准测试(即LXMERT)表现得更好。最后，详细的消融研究证明了我的增强策略对LXMERT是有效的，并观察了不同措施对模型的有效性。

{"title":"RGFGM-LXMERT-An Improve Architecture Based On LXMERT","authors":"Renjie Yu","doi":"10.1145/3581807.3581879","DOIUrl":"https://doi.org/10.1145/3581807.3581879","url":null,"abstract":"LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a two-stream cross-modality pre-trained model that performs well in different downstream tasks which contain two visual question answering datasets and a challenging visual-reasoning task (i.e., VQA, GQA, and NLVR). But the large-scale model still has a lot of room for progress. That is, the model accuracy is very low, the generalization ability is weak, and it is easy to be attacked by adversarial attacks. Furthermore, training the LXMERT model takes a lot of time and money, so there is an urgent need to improve. Thus, I try to improve the training speed, generalization ability, and accuracy of the model by enhancing both the training method and the model structure. In the training method, FGM (Fast Gradient Method) adversarial training is introduced in the finetune phase of the model by adding the disturbances in both the language embedding layer's and visual feature linear layer's weights, which effectively improves the model accuracy and generalization ability. In the model structure, a residual block with weight is used to improve the training speed by 1.6% in the pre-training phase of this model without losing the model performance. Next, t the most important structure, the Encoder, is redesigned to make the model more convergent. The Encoder's FFN (Feed-Forward Neural Network) is replaced by GLU (Gated Linear Unit), which also improves the ability of model fitting and model performance. The improved model performs better on the VQA task than the benchmark (i.e., LXMERT). In the end, detailed ablation studies prove that my enhancement strategies are effective for LXMERT and observe the effectiveness of different measures on the model.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Unstructured Data Desensitization Approach for Futures Industry 期货行业的非结构化数据脱敏方法

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581885

Xiaofan Zhi, Li Xue, Sihao Xie

The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.

大数据和人工智能技术的发展为金融机构的数据挖掘提供了强大的推动力，同时也为防止私人数据泄露带来了挑战。数据脱敏技术是保护私有数据的一种方法。与结构化数据脱敏技术相比，非结构化数据脱敏技术还面临着一些挑战。一方面，从图像、语音和视频等非结构化数据中识别文本的准确性严重影响脱敏性能。另一方面，传统的基于规则和匹配的敏感信息识别方法在面对复杂的金融数据时，往往会产生不可接受的脱敏结果。针对这些问题，本文提出了一种全新的非结构化数据脱敏方法。该非结构化数据脱敏方法首先采用基于多级细粒度验证的文本转换精度评价模型来提高文本识别的精度，然后引入基于混合分析的敏感信息识别模型来降低敏感信息识别的漏检率和误检率，在真实数据集上取得了满意的结果。

{"title":"An Unstructured Data Desensitization Approach for Futures Industry","authors":"Xiaofan Zhi, Li Xue, Sihao Xie","doi":"10.1145/3581807.3581885","DOIUrl":"https://doi.org/10.1145/3581807.3581885","url":null,"abstract":"The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128610894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining GEO Database and the Method of Network Pharmacology to Explore the Molecular Mechanism of Epimedium in the Treatment of Alzheimer's Disease 结合GEO数据库和网络药理学方法探讨淫羊藿治疗阿尔茨海默病的分子机制

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581884

Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han

Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.

摘要淫羊藿是一种中药，被广泛用于治疗阿尔茨海默病(AD)等神经退行性疾病。然而，以往研究中基于蛋白质组学和基因组学的常规实验方法难以全面描述淫羊藿治疗AD的机制。本研究借助计算机软件，结合GEO数据库和网络药理学方法，建立相关药理网络和核心靶点网络，并进行可视化分析。然后我们进行GO和KEGG富集分析，对淫羊藿治疗AD的机制进行较为全面的阐述，筛选关键机制和靶点。结果提示淫羊藿可能作用于PIK3CB、BCL-2等关键靶点，参与调控PI3K-Akt和钙信号通路，参与AD的治疗。本研究为淫羊藿的深入分析提供了理论基础，为相关新药的开发奠定了基础。

{"title":"Combining GEO Database and the Method of Network Pharmacology to Explore the Molecular Mechanism of Epimedium in the Treatment of Alzheimer's Disease","authors":"Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han","doi":"10.1145/3581807.3581884","DOIUrl":"https://doi.org/10.1145/3581807.3581884","url":null,"abstract":"Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131874884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Activation During Upper Limb Movements Measured with Functional Near-Infrared Spectroscopy in Healthy Elderly Subjects 用功能近红外光谱测量健康老年人上肢运动时的激活

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581877

Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie

Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.

目的:了解皮层激活模式对探索老年人运动控制机制具有重要意义。本研究利用功能近红外光谱(fNIRS)研究老年人上肢运动时的血流动力学反应。方法:连续记录8名健康老年受试者在静息状态(RS)、左右上肢运动(RM和LM)时双侧前额叶皮层(PFC)和运动皮层(MC)的多通道fNIRS信号。在这项研究中，我们应用NIRS-SPM软件中的广义线性模型(GLM)来计算血红蛋白浓度的变化，并描述运动任务期间的大脑激活。结果:结果显示，在RM任务时，被试的左运动皮层的氧合血红蛋白浓度变化更为集中，在LM任务时，右半球包括前额叶皮层和运动皮层的氧合血红蛋白浓度变化更为集中。进一步的分析还表明，两个半球在RM和LM任务中存在显著差异，而在RS任务中没有差异。结论:fNIRS信号可以可靠地量化肢体运动过程中的神经元活动。该研究可能为上肢运动的运动机制提供新的认识，并对监测脑功能具有重要意义。

{"title":"Activation During Upper Limb Movements Measured with Functional Near-Infrared Spectroscopy in Healthy Elderly Subjects","authors":"Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie","doi":"10.1145/3581807.3581877","DOIUrl":"https://doi.org/10.1145/3581807.3581877","url":null,"abstract":"Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134188610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few-Shot Data Augmentation for Industrial Character Recognition 工业字符识别的少镜头数据增强

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581841

Hongchao Gao, Xiaoqian Huang, Bofeng Liu

The task of industrial character recognition is to extract character content on the surface of the workpiece in the industrial production process. Limited training data, incomplete available character categories and non-standardized character styles encountered in actual production have led to a significant reduction in the recognition performance of deep learning-based methods, such as scene text recognition and Optical Character Recognition (OCR). In this paper, we propose an augmentation strategy suitable for industrial character recognition based on the Generative Adversarial Network (GAN). The strategy consists of two modules, a character detection module and a synthetic data generation module. The results show that the augmentation strategy achieves best generation results. Recognition network utilizing the augmentation dataset generated by the strategy can achieve the best results on four types of industrial datasets.

工业字符识别的任务是在工业生产过程中提取工件表面的字符内容。有限的训练数据、不完整的可用字符类别以及在实际生产中遇到的非标准化字符样式导致基于深度学习的方法(如场景文本识别和光学字符识别(OCR))的识别性能显著降低。在本文中，我们提出了一种基于生成对抗网络(GAN)的适合工业字符识别的增强策略。该策略包括两个模块:字符检测模块和综合数据生成模块。结果表明，该增强策略获得了最佳的生成效果。利用该策略生成的增强数据集的识别网络可以在四种类型的工业数据集上获得最佳结果。

引用次数: 0

Network Bandwidth Prediction Method Based on Hidden Markov model in High-speed Railway 基于隐马尔可夫模型的高速铁路网络带宽预测方法

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581900

Luyao Wang, Jia Guo, Ye Zhu, Heying Song, Yanmin Wei, Jinao Wang

In the context of the full commercial use of 5G, high-speed rail passengers have higher and higher requirements for wireless network service quality. However, in the current high-speed rail 5G network streaming media transmission, due to the fast moving speed, the base station is frequently switched, and the user bandwidth does not match the streaming media bit rate, resulting in a poor user network experience and a poor streaming media experience. In view of the above problems, this paper focuses on the bandwidth prediction of network users in the high-speed rail environment, and proposes a bandwidth prediction algorithm High speed 5G Environment Bandwidth Predict(H5EBP) based on the hidden Markov model in different states of the high-speed rail. So as to improve the user's streaming media experience. After comparative evaluation with other existing bandwidth prediction algorithms, H5EBP can greatly improve the accuracy of bandwidth prediction, thereby improving the user's streaming media experience.

在5G全面商用的背景下，高铁旅客对无线网络服务质量的要求越来越高。然而，在当前高铁5G网络流媒体传输中，由于移动速度快，基站频繁切换，用户带宽与流媒体比特率不匹配，导致用户网络体验差，流媒体体验差。针对上述问题，本文针对高铁环境下网络用户的带宽预测问题，提出了一种基于隐马尔可夫模型的高铁不同状态下高速5G环境带宽预测(H5EBP)带宽预测算法。从而提高用户的流媒体体验。经过与现有其他带宽预测算法的对比评估，H5EBP可以大大提高带宽预测的准确性，从而改善用户的流媒体体验。

引用次数: 0

DQN Method Analysis for Network Routing of Electric Optical Communication Network 电光通信网络路由的DQN方法分析

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581898

Yuqing Zhong, Xiong Wei Zhang, Wuhua Xu

Route planning of electric optical communication network play crucial role for communication reliability and performance. For the purpose to carry out enforcement learning and obtain optimized routing result, Deep Q Network (DQN), which has been approved to be a high performance neural network model, is analyzed for electric optical network routing. Depend on network function and structure, large scale electric optical communication network can be divided into several sub networks for better training speed. Advanced DQN model is analysis and trained for a 200 nodes communication network and a 700 nodes communication network. The training results of different scale networks, which can prove the effectiveness of this method, are given with reward data and running time for comparison. This method can be used for dynamic route planning of a large scale electric communication network.

电光通信网络的路由规划对通信的可靠性和性能起着至关重要的作用。为了进行强制学习并获得优化的路由结果，对已被认可为高性能神经网络模型的深Q网络(Deep Q Network, DQN)进行了电光网络路由分析。根据网络功能和结构的不同，可以将大型电光通信网络划分为若干个子网络，以提高训练速度。对200节点通信网络和700节点通信网络的DQN模型进行了分析和训练。给出了不同规模网络的训练结果，证明了该方法的有效性，并给出了奖励数据和运行时间进行比较。该方法可用于大规模电力通信网络的动态路由规划。

引用次数: 0

Comparative Study on EEG Feature Recognition based on Deep Belief Network 基于深度信念网络的脑电信号特征识别比较研究

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581871

Guangrong Liu, Bin Hao, Abdelkader Nasreddine Belkacem, Jiaxin Zhang, Penghai Li, Jun Liang, Changming Wang, Chao Chen

In Brain Computer interface (BCI) system, motor imagination has some problems, such as difficulty in extracting EEG signal features, low accuracy of classification and recognition, long training time and gradient saturation in feature classification based on traditional deep neural network, etc. In this paper, a deep belief network (DBN) model is proposed. Fast Fourier transform (FFT) and wavelet transform (WT) combined with deep machine learning model DBN were used to extract the feature vectors of time-frequency signals of different leads, superposition and average them, and then perform classification experiments. The number of DBN network layers and the number of neurons in each layer were determined by iteration. Through the reverse fine-tuning, the optimal weight coefficient W and the paranoid term B are determined layer by layer, and the training and optimization problems of deep neural networks are solved. In this paper, a motion imagination and Motion observation (MI-AO) experiment is designed, which can be obtained by comparing with the public dataset BCI Competition IV 2a. The DBN model is used to compare with other algorithms, and the average accuracy of binary classification is 83.81%, and the average accuracy of four classification is 80.77%.

在脑机接口(BCI)系统中，运动想象存在脑电信号特征提取困难、分类识别准确率低、训练时间长、基于传统深度神经网络的特征分类存在梯度饱和等问题。本文提出了一种深度信念网络(DBN)模型。采用快速傅里叶变换(FFT)和小波变换(WT)结合深度机器学习模型DBN提取不同导联时频信号的特征向量，对其进行叠加和平均，然后进行分类实验。通过迭代确定DBN网络的层数和每层神经元的个数。通过反向微调，逐层确定最优权系数W和偏执项B，解决深度神经网络的训练和优化问题。本文设计了一个运动想象和运动观察(MI-AO)实验，该实验可以通过与公共数据集BCI Competition IV 2a进行比较得到。采用DBN模型与其他算法进行对比，二值分类的平均准确率为83.81%，四种分类的平均准确率为80.77%。

{"title":"Comparative Study on EEG Feature Recognition based on Deep Belief Network","authors":"Guangrong Liu, Bin Hao, Abdelkader Nasreddine Belkacem, Jiaxin Zhang, Penghai Li, Jun Liang, Changming Wang, Chao Chen","doi":"10.1145/3581807.3581871","DOIUrl":"https://doi.org/10.1145/3581807.3581871","url":null,"abstract":"In Brain Computer interface (BCI) system, motor imagination has some problems, such as difficulty in extracting EEG signal features, low accuracy of classification and recognition, long training time and gradient saturation in feature classification based on traditional deep neural network, etc. In this paper, a deep belief network (DBN) model is proposed. Fast Fourier transform (FFT) and wavelet transform (WT) combined with deep machine learning model DBN were used to extract the feature vectors of time-frequency signals of different leads, superposition and average them, and then perform classification experiments. The number of DBN network layers and the number of neurons in each layer were determined by iteration. Through the reverse fine-tuning, the optimal weight coefficient W and the paranoid term B are determined layer by layer, and the training and optimization problems of deep neural networks are solved. In this paper, a motion imagination and Motion observation (MI-AO) experiment is designed, which can be obtained by comparing with the public dataset BCI Competition IV 2a. The DBN model is used to compare with other algorithms, and the average accuracy of binary classification is 83.81%, and the average accuracy of four classification is 80.77%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"288 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nutrient Deficiency Diagnosis of Plants Based on Transfer Learning and Lightweight Convolutional Neural Networks MobileNetV3-Large 基于迁移学习和轻量级卷积神经网络的植物营养缺乏症诊断

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581812

Qian Yan, Xuhong Lin, Wenwen Gong, Caicong Wu, Yifei Chen

Nutrient Deficiency Diagnosis of Plants is an important application in precision agriculture. At present, nutrient deficiency diagnosis of plants mainly depends on manual identification, which makes it difficult to ensure efficiency and accuracy. Therefore, based on deep learning and focusing on the problems of difficult convergence and poor real-time performance of the existing deep convolution neural network in the detection of plant nutrient deficiency, this study proposes a lightweight model—UMNet (Nutrient-MobileNetV3-Network) for plant nutrient deficiency detection. This model enhances the collected rice leaf images to expand the dataset, then migrates the knowledge learned by the MobilenetV3-Large network on the ImageNet dataset to UMNet, redesigns a new full connection layer, and uses a new activation function. The experimental results show that: (1) Transfer learning solves the problem of insufficient training data. Compared with learning without transfer learning, the accuracy increases by 7.22% ∼ 9.63%, which greatly improves the convergence speed and recognition accuracy of the model. (2) Compared with complex convolutional neural networks(CNN), such as InceptionV3, InceptionResnetV2 and VGG16, the lightweight network UMNet has lower storage requirements and shorter training time. At the same time, it can still ensure high accuracy, and the recognition accuracy is better than other lightweight networks with the same complexity: ShuffleNetV2, EfficientNetB0 and Xception. The identification accuracy of the plant nutrient deficiency detection model UMNet constructed in this paper can reach 97.80%, and the training time of a single epoch is about 46.4s. It only takes 1.45s to predict the nutrient deficiency of a single object, which realizes the intelligent detection in the field of plant nutrient deficiency, and it will promote academic exploration of deep learning in plant phenotype research.

植物营养缺乏症诊断是精准农业的重要应用。目前，植物营养缺乏症诊断主要依靠人工鉴定，难以保证效率和准确性。因此，本研究基于深度学习，针对现有深度卷积神经网络在植物营养缺乏症检测中难以收敛和实时性差的问题，提出了一种用于植物营养缺乏症检测的轻量级模型- umnet (nutrient - mobilenetv3 - network)。该模型对采集的水稻叶片图像进行增强，扩展数据集，然后将MobilenetV3-Large网络在ImageNet数据集上学习到的知识迁移到UMNet上，重新设计新的全连接层，并使用新的激活函数。实验结果表明:(1)迁移学习解决了训练数据不足的问题。与不进行迁移学习的学习相比，准确率提高了7.22% ~ 9.63%，大大提高了模型的收敛速度和识别准确率。(2)与InceptionV3、InceptionResnetV2和VGG16等复杂卷积神经网络(CNN)相比，轻量级网络UMNet具有更低的存储要求和更短的训练时间。同时，它仍然可以保证较高的准确率，并且识别精度优于相同复杂度的其他轻量级网络:ShuffleNetV2、EfficientNetB0和Xception。本文构建的植物营养缺乏症检测模型UMNet的识别准确率可达97.80%，单历元训练时间约为46.4s。预测单个对象的营养缺乏症只需1.45s，实现了植物营养缺乏症领域的智能检测，将促进植物表型研究中深度学习的学术探索。

{"title":"Nutrient Deficiency Diagnosis of Plants Based on Transfer Learning and Lightweight Convolutional Neural Networks MobileNetV3-Large","authors":"Qian Yan, Xuhong Lin, Wenwen Gong, Caicong Wu, Yifei Chen","doi":"10.1145/3581807.3581812","DOIUrl":"https://doi.org/10.1145/3581807.3581812","url":null,"abstract":"Nutrient Deficiency Diagnosis of Plants is an important application in precision agriculture. At present, nutrient deficiency diagnosis of plants mainly depends on manual identification, which makes it difficult to ensure efficiency and accuracy. Therefore, based on deep learning and focusing on the problems of difficult convergence and poor real-time performance of the existing deep convolution neural network in the detection of plant nutrient deficiency, this study proposes a lightweight model—UMNet (Nutrient-MobileNetV3-Network) for plant nutrient deficiency detection. This model enhances the collected rice leaf images to expand the dataset, then migrates the knowledge learned by the MobilenetV3-Large network on the ImageNet dataset to UMNet, redesigns a new full connection layer, and uses a new activation function. The experimental results show that: (1) Transfer learning solves the problem of insufficient training data. Compared with learning without transfer learning, the accuracy increases by 7.22% ∼ 9.63%, which greatly improves the convergence speed and recognition accuracy of the model. (2) Compared with complex convolutional neural networks(CNN), such as InceptionV3, InceptionResnetV2 and VGG16, the lightweight network UMNet has lower storage requirements and shorter training time. At the same time, it can still ensure high accuracy, and the recognition accuracy is better than other lightweight networks with the same complexity: ShuffleNetV2, EfficientNetB0 and Xception. The identification accuracy of the plant nutrient deficiency detection model UMNet constructed in this paper can reach 97.80%, and the training time of a single epoch is about 46.4s. It only takes 1.45s to predict the nutrient deficiency of a single object, which realizes the intelligent detection in the field of plant nutrient deficiency, and it will promote academic exploration of deep learning in plant phenotype research.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VRGNet: A Robust Visible Region-Guided Network for Occluded Pedestrian Detection VRGNet:用于遮挡行人检测的鲁棒可见区域引导网络

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581817

Xin Mao, Chaoqi Yan, Hong Zhang, J. Song, Ding Yuan

Pedestrian detection has made significant progress in both academic and industrial fields. However, there are still some challenging questions with regard to occlusion scene. In this paper, we propose a novel and robust visible region-guided network (VRGNet) to specially improve the occluded pedestrian detection performance. Specifically, we leverage the adapted FPN-based framework to extract multi-scale features, and fuse them together to encode more precision localization and semantic information. In addition, we construct a pedestrian part pool that covers almost all the scale of different occluded body regions. Meanwhile, we propose a new occlusion handling strategy by elaborately integrating the prior knowledge of different visible body regions with visibility prediction into the detection framework to deal with pedestrians with different degree of occlusion. The extensive experiments demonstrate that our VRGNet achieves a leading performance under different evaluation settings on Caltech-USA dataset, especially for occluded pedestrians. In addition, it also achieves a competitive of 48.4%, 9.3%, 6.7% under the Heavy, Partial and Bare settings respectively on CityPersons dataset compared with other state-of-the-art pedestrian detection algorithms, while keeping a better speed-accuracy trade-off.

行人检测在学术和工业领域都取得了重大进展。然而，关于遮挡场景，仍然存在一些具有挑战性的问题。在本文中，我们提出了一种新颖的鲁棒可见区域引导网络(VRGNet)来提高遮挡行人的检测性能。具体来说，我们利用改进的基于fpn的框架来提取多尺度特征，并将它们融合在一起以编码更精确的定位和语义信息。此外，我们构建了一个行人部分池，几乎涵盖了不同遮挡体区域的所有尺度。同时，我们提出了一种新的遮挡处理策略，将不同可见身体区域的先验知识与可见度预测结合到检测框架中，以处理不同遮挡程度的行人。大量的实验表明，我们的VRGNet在加州理工-美国数据集的不同评估设置下都取得了领先的性能，特别是对于遮挡的行人。此外，与其他最先进的行人检测算法相比，该算法在CityPersons数据集的Heavy、Partial和Bare设置下分别达到了48.4%、9.3%和6.7%的竞争力，同时保持了更好的速度和准确性权衡。

{"title":"VRGNet: A Robust Visible Region-Guided Network for Occluded Pedestrian Detection","authors":"Xin Mao, Chaoqi Yan, Hong Zhang, J. Song, Ding Yuan","doi":"10.1145/3581807.3581817","DOIUrl":"https://doi.org/10.1145/3581807.3581817","url":null,"abstract":"Pedestrian detection has made significant progress in both academic and industrial fields. However, there are still some challenging questions with regard to occlusion scene. In this paper, we propose a novel and robust visible region-guided network (VRGNet) to specially improve the occluded pedestrian detection performance. Specifically, we leverage the adapted FPN-based framework to extract multi-scale features, and fuse them together to encode more precision localization and semantic information. In addition, we construct a pedestrian part pool that covers almost all the scale of different occluded body regions. Meanwhile, we propose a new occlusion handling strategy by elaborately integrating the prior knowledge of different visible body regions with visibility prediction into the detection framework to deal with pedestrians with different degree of occlusion. The extensive experiments demonstrate that our VRGNet achieves a leading performance under different evaluation settings on Caltech-USA dataset, especially for occluded pedestrians. In addition, it also achieves a competitive of 48.4%, 9.3%, 6.7% under the Heavy, Partial and Bare settings respectively on CityPersons dataset compared with other state-of-the-art pedestrian detection algorithms, while keeping a better speed-accuracy trade-off.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123704960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0