LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a two-stream cross-modality pre-trained model that performs well in different downstream tasks which contain two visual question answering datasets and a challenging visual-reasoning task (i.e., VQA, GQA, and NLVR). But the large-scale model still has a lot of room for progress. That is, the model accuracy is very low, the generalization ability is weak, and it is easy to be attacked by adversarial attacks. Furthermore, training the LXMERT model takes a lot of time and money, so there is an urgent need to improve. Thus, I try to improve the training speed, generalization ability, and accuracy of the model by enhancing both the training method and the model structure. In the training method, FGM (Fast Gradient Method) adversarial training is introduced in the finetune phase of the model by adding the disturbances in both the language embedding layer's and visual feature linear layer's weights, which effectively improves the model accuracy and generalization ability. In the model structure, a residual block with weight is used to improve the training speed by 1.6% in the pre-training phase of this model without losing the model performance. Next, t the most important structure, the Encoder, is redesigned to make the model more convergent. The Encoder's FFN (Feed-Forward Neural Network) is replaced by GLU (Gated Linear Unit), which also improves the ability of model fitting and model performance. The improved model performs better on the VQA task than the benchmark (i.e., LXMERT). In the end, detailed ablation studies prove that my enhancement strategies are effective for LXMERT and observe the effectiveness of different measures on the model.
LXMERT (Learning Cross-Modality Encoder Representations from Transformers)是一种双流跨模态预训练模型,在包含两个视觉问答数据集和一个具有挑战性的视觉推理任务(即VQA、GQA和NLVR)的不同下游任务中表现良好。但大规模模型仍有很大的进步空间。即模型精度很低,泛化能力较弱,容易受到对抗性攻击。此外,训练LXMERT模型需要花费大量的时间和金钱,因此迫切需要改进。因此,我试图通过改进训练方法和模型结构来提高模型的训练速度、泛化能力和准确性。在训练方法中,通过在语言嵌入层和视觉特征线性层的权值中加入干扰,在模型的微调阶段引入FGM (Fast Gradient method)对抗训练,有效提高了模型的准确率和泛化能力。在模型结构中,在不损失模型性能的前提下,在模型的预训练阶段,使用带有权重的残差块将训练速度提高1.6%。接下来,重新设计最重要的结构——编码器,使模型更加收敛。将编码器的前馈神经网络(FFN)替换为门控线性单元(GLU),提高了模型拟合能力和模型性能。改进的模型在VQA任务上比基准测试(即LXMERT)表现得更好。最后,详细的消融研究证明了我的增强策略对LXMERT是有效的,并观察了不同措施对模型的有效性。
{"title":"RGFGM-LXMERT-An Improve Architecture Based On LXMERT","authors":"Renjie Yu","doi":"10.1145/3581807.3581879","DOIUrl":"https://doi.org/10.1145/3581807.3581879","url":null,"abstract":"LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a two-stream cross-modality pre-trained model that performs well in different downstream tasks which contain two visual question answering datasets and a challenging visual-reasoning task (i.e., VQA, GQA, and NLVR). But the large-scale model still has a lot of room for progress. That is, the model accuracy is very low, the generalization ability is weak, and it is easy to be attacked by adversarial attacks. Furthermore, training the LXMERT model takes a lot of time and money, so there is an urgent need to improve. Thus, I try to improve the training speed, generalization ability, and accuracy of the model by enhancing both the training method and the model structure. In the training method, FGM (Fast Gradient Method) adversarial training is introduced in the finetune phase of the model by adding the disturbances in both the language embedding layer's and visual feature linear layer's weights, which effectively improves the model accuracy and generalization ability. In the model structure, a residual block with weight is used to improve the training speed by 1.6% in the pre-training phase of this model without losing the model performance. Next, t the most important structure, the Encoder, is redesigned to make the model more convergent. The Encoder's FFN (Feed-Forward Neural Network) is replaced by GLU (Gated Linear Unit), which also improves the ability of model fitting and model performance. The improved model performs better on the VQA task than the benchmark (i.e., LXMERT). In the end, detailed ablation studies prove that my enhancement strategies are effective for LXMERT and observe the effectiveness of different measures on the model.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.
{"title":"An Unstructured Data Desensitization Approach for Futures Industry","authors":"Xiaofan Zhi, Li Xue, Sihao Xie","doi":"10.1145/3581807.3581885","DOIUrl":"https://doi.org/10.1145/3581807.3581885","url":null,"abstract":"The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128610894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han
Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.
{"title":"Combining GEO Database and the Method of Network Pharmacology to Explore the Molecular Mechanism of Epimedium in the Treatment of Alzheimer's Disease","authors":"Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han","doi":"10.1145/3581807.3581884","DOIUrl":"https://doi.org/10.1145/3581807.3581884","url":null,"abstract":"Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131874884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie
Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.
{"title":"Activation During Upper Limb Movements Measured with Functional Near-Infrared Spectroscopy in Healthy Elderly Subjects","authors":"Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie","doi":"10.1145/3581807.3581877","DOIUrl":"https://doi.org/10.1145/3581807.3581877","url":null,"abstract":"Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134188610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of industrial character recognition is to extract character content on the surface of the workpiece in the industrial production process. Limited training data, incomplete available character categories and non-standardized character styles encountered in actual production have led to a significant reduction in the recognition performance of deep learning-based methods, such as scene text recognition and Optical Character Recognition (OCR). In this paper, we propose an augmentation strategy suitable for industrial character recognition based on the Generative Adversarial Network (GAN). The strategy consists of two modules, a character detection module and a synthetic data generation module. The results show that the augmentation strategy achieves best generation results. Recognition network utilizing the augmentation dataset generated by the strategy can achieve the best results on four types of industrial datasets.
{"title":"Few-Shot Data Augmentation for Industrial Character Recognition","authors":"Hongchao Gao, Xiaoqian Huang, Bofeng Liu","doi":"10.1145/3581807.3581841","DOIUrl":"https://doi.org/10.1145/3581807.3581841","url":null,"abstract":"The task of industrial character recognition is to extract character content on the surface of the workpiece in the industrial production process. Limited training data, incomplete available character categories and non-standardized character styles encountered in actual production have led to a significant reduction in the recognition performance of deep learning-based methods, such as scene text recognition and Optical Character Recognition (OCR). In this paper, we propose an augmentation strategy suitable for industrial character recognition based on the Generative Adversarial Network (GAN). The strategy consists of two modules, a character detection module and a synthetic data generation module. The results show that the augmentation strategy achieves best generation results. Recognition network utilizing the augmentation dataset generated by the strategy can achieve the best results on four types of industrial datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luyao Wang, Jia Guo, Ye Zhu, Heying Song, Yanmin Wei, Jinao Wang
In the context of the full commercial use of 5G, high-speed rail passengers have higher and higher requirements for wireless network service quality. However, in the current high-speed rail 5G network streaming media transmission, due to the fast moving speed, the base station is frequently switched, and the user bandwidth does not match the streaming media bit rate, resulting in a poor user network experience and a poor streaming media experience. In view of the above problems, this paper focuses on the bandwidth prediction of network users in the high-speed rail environment, and proposes a bandwidth prediction algorithm High speed 5G Environment Bandwidth Predict(H5EBP) based on the hidden Markov model in different states of the high-speed rail. So as to improve the user's streaming media experience. After comparative evaluation with other existing bandwidth prediction algorithms, H5EBP can greatly improve the accuracy of bandwidth prediction, thereby improving the user's streaming media experience.
{"title":"Network Bandwidth Prediction Method Based on Hidden Markov model in High-speed Railway","authors":"Luyao Wang, Jia Guo, Ye Zhu, Heying Song, Yanmin Wei, Jinao Wang","doi":"10.1145/3581807.3581900","DOIUrl":"https://doi.org/10.1145/3581807.3581900","url":null,"abstract":"In the context of the full commercial use of 5G, high-speed rail passengers have higher and higher requirements for wireless network service quality. However, in the current high-speed rail 5G network streaming media transmission, due to the fast moving speed, the base station is frequently switched, and the user bandwidth does not match the streaming media bit rate, resulting in a poor user network experience and a poor streaming media experience. In view of the above problems, this paper focuses on the bandwidth prediction of network users in the high-speed rail environment, and proposes a bandwidth prediction algorithm High speed 5G Environment Bandwidth Predict(H5EBP) based on the hidden Markov model in different states of the high-speed rail. So as to improve the user's streaming media experience. After comparative evaluation with other existing bandwidth prediction algorithms, H5EBP can greatly improve the accuracy of bandwidth prediction, thereby improving the user's streaming media experience.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116664712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Route planning of electric optical communication network play crucial role for communication reliability and performance. For the purpose to carry out enforcement learning and obtain optimized routing result, Deep Q Network (DQN), which has been approved to be a high performance neural network model, is analyzed for electric optical network routing. Depend on network function and structure, large scale electric optical communication network can be divided into several sub networks for better training speed. Advanced DQN model is analysis and trained for a 200 nodes communication network and a 700 nodes communication network. The training results of different scale networks, which can prove the effectiveness of this method, are given with reward data and running time for comparison. This method can be used for dynamic route planning of a large scale electric communication network.
{"title":"DQN Method Analysis for Network Routing of Electric Optical Communication Network","authors":"Yuqing Zhong, Xiong Wei Zhang, Wuhua Xu","doi":"10.1145/3581807.3581898","DOIUrl":"https://doi.org/10.1145/3581807.3581898","url":null,"abstract":"Route planning of electric optical communication network play crucial role for communication reliability and performance. For the purpose to carry out enforcement learning and obtain optimized routing result, Deep Q Network (DQN), which has been approved to be a high performance neural network model, is analyzed for electric optical network routing. Depend on network function and structure, large scale electric optical communication network can be divided into several sub networks for better training speed. Advanced DQN model is analysis and trained for a 200 nodes communication network and a 700 nodes communication network. The training results of different scale networks, which can prove the effectiveness of this method, are given with reward data and running time for comparison. This method can be used for dynamic route planning of a large scale electric communication network.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117042244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangrong Liu, Bin Hao, Abdelkader Nasreddine Belkacem, Jiaxin Zhang, Penghai Li, Jun Liang, Changming Wang, Chao Chen
In Brain Computer interface (BCI) system, motor imagination has some problems, such as difficulty in extracting EEG signal features, low accuracy of classification and recognition, long training time and gradient saturation in feature classification based on traditional deep neural network, etc. In this paper, a deep belief network (DBN) model is proposed. Fast Fourier transform (FFT) and wavelet transform (WT) combined with deep machine learning model DBN were used to extract the feature vectors of time-frequency signals of different leads, superposition and average them, and then perform classification experiments. The number of DBN network layers and the number of neurons in each layer were determined by iteration. Through the reverse fine-tuning, the optimal weight coefficient W and the paranoid term B are determined layer by layer, and the training and optimization problems of deep neural networks are solved. In this paper, a motion imagination and Motion observation (MI-AO) experiment is designed, which can be obtained by comparing with the public dataset BCI Competition IV 2a. The DBN model is used to compare with other algorithms, and the average accuracy of binary classification is 83.81%, and the average accuracy of four classification is 80.77%.
在脑机接口(BCI)系统中,运动想象存在脑电信号特征提取困难、分类识别准确率低、训练时间长、基于传统深度神经网络的特征分类存在梯度饱和等问题。本文提出了一种深度信念网络(DBN)模型。采用快速傅里叶变换(FFT)和小波变换(WT)结合深度机器学习模型DBN提取不同导联时频信号的特征向量,对其进行叠加和平均,然后进行分类实验。通过迭代确定DBN网络的层数和每层神经元的个数。通过反向微调,逐层确定最优权系数W和偏执项B,解决深度神经网络的训练和优化问题。本文设计了一个运动想象和运动观察(MI-AO)实验,该实验可以通过与公共数据集BCI Competition IV 2a进行比较得到。采用DBN模型与其他算法进行对比,二值分类的平均准确率为83.81%,四种分类的平均准确率为80.77%。
{"title":"Comparative Study on EEG Feature Recognition based on Deep Belief Network","authors":"Guangrong Liu, Bin Hao, Abdelkader Nasreddine Belkacem, Jiaxin Zhang, Penghai Li, Jun Liang, Changming Wang, Chao Chen","doi":"10.1145/3581807.3581871","DOIUrl":"https://doi.org/10.1145/3581807.3581871","url":null,"abstract":"In Brain Computer interface (BCI) system, motor imagination has some problems, such as difficulty in extracting EEG signal features, low accuracy of classification and recognition, long training time and gradient saturation in feature classification based on traditional deep neural network, etc. In this paper, a deep belief network (DBN) model is proposed. Fast Fourier transform (FFT) and wavelet transform (WT) combined with deep machine learning model DBN were used to extract the feature vectors of time-frequency signals of different leads, superposition and average them, and then perform classification experiments. The number of DBN network layers and the number of neurons in each layer were determined by iteration. Through the reverse fine-tuning, the optimal weight coefficient W and the paranoid term B are determined layer by layer, and the training and optimization problems of deep neural networks are solved. In this paper, a motion imagination and Motion observation (MI-AO) experiment is designed, which can be obtained by comparing with the public dataset BCI Competition IV 2a. The DBN model is used to compare with other algorithms, and the average accuracy of binary classification is 83.81%, and the average accuracy of four classification is 80.77%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"288 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nutrient Deficiency Diagnosis of Plants is an important application in precision agriculture. At present, nutrient deficiency diagnosis of plants mainly depends on manual identification, which makes it difficult to ensure efficiency and accuracy. Therefore, based on deep learning and focusing on the problems of difficult convergence and poor real-time performance of the existing deep convolution neural network in the detection of plant nutrient deficiency, this study proposes a lightweight model—UMNet (Nutrient-MobileNetV3-Network) for plant nutrient deficiency detection. This model enhances the collected rice leaf images to expand the dataset, then migrates the knowledge learned by the MobilenetV3-Large network on the ImageNet dataset to UMNet, redesigns a new full connection layer, and uses a new activation function. The experimental results show that: (1) Transfer learning solves the problem of insufficient training data. Compared with learning without transfer learning, the accuracy increases by 7.22% ∼ 9.63%, which greatly improves the convergence speed and recognition accuracy of the model. (2) Compared with complex convolutional neural networks(CNN), such as InceptionV3, InceptionResnetV2 and VGG16, the lightweight network UMNet has lower storage requirements and shorter training time. At the same time, it can still ensure high accuracy, and the recognition accuracy is better than other lightweight networks with the same complexity: ShuffleNetV2, EfficientNetB0 and Xception. The identification accuracy of the plant nutrient deficiency detection model UMNet constructed in this paper can reach 97.80%, and the training time of a single epoch is about 46.4s. It only takes 1.45s to predict the nutrient deficiency of a single object, which realizes the intelligent detection in the field of plant nutrient deficiency, and it will promote academic exploration of deep learning in plant phenotype research.
{"title":"Nutrient Deficiency Diagnosis of Plants Based on Transfer Learning and Lightweight Convolutional Neural Networks MobileNetV3-Large","authors":"Qian Yan, Xuhong Lin, Wenwen Gong, Caicong Wu, Yifei Chen","doi":"10.1145/3581807.3581812","DOIUrl":"https://doi.org/10.1145/3581807.3581812","url":null,"abstract":"Nutrient Deficiency Diagnosis of Plants is an important application in precision agriculture. At present, nutrient deficiency diagnosis of plants mainly depends on manual identification, which makes it difficult to ensure efficiency and accuracy. Therefore, based on deep learning and focusing on the problems of difficult convergence and poor real-time performance of the existing deep convolution neural network in the detection of plant nutrient deficiency, this study proposes a lightweight model—UMNet (Nutrient-MobileNetV3-Network) for plant nutrient deficiency detection. This model enhances the collected rice leaf images to expand the dataset, then migrates the knowledge learned by the MobilenetV3-Large network on the ImageNet dataset to UMNet, redesigns a new full connection layer, and uses a new activation function. The experimental results show that: (1) Transfer learning solves the problem of insufficient training data. Compared with learning without transfer learning, the accuracy increases by 7.22% ∼ 9.63%, which greatly improves the convergence speed and recognition accuracy of the model. (2) Compared with complex convolutional neural networks(CNN), such as InceptionV3, InceptionResnetV2 and VGG16, the lightweight network UMNet has lower storage requirements and shorter training time. At the same time, it can still ensure high accuracy, and the recognition accuracy is better than other lightweight networks with the same complexity: ShuffleNetV2, EfficientNetB0 and Xception. The identification accuracy of the plant nutrient deficiency detection model UMNet constructed in this paper can reach 97.80%, and the training time of a single epoch is about 46.4s. It only takes 1.45s to predict the nutrient deficiency of a single object, which realizes the intelligent detection in the field of plant nutrient deficiency, and it will promote academic exploration of deep learning in plant phenotype research.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Mao, Chaoqi Yan, Hong Zhang, J. Song, Ding Yuan
Pedestrian detection has made significant progress in both academic and industrial fields. However, there are still some challenging questions with regard to occlusion scene. In this paper, we propose a novel and robust visible region-guided network (VRGNet) to specially improve the occluded pedestrian detection performance. Specifically, we leverage the adapted FPN-based framework to extract multi-scale features, and fuse them together to encode more precision localization and semantic information. In addition, we construct a pedestrian part pool that covers almost all the scale of different occluded body regions. Meanwhile, we propose a new occlusion handling strategy by elaborately integrating the prior knowledge of different visible body regions with visibility prediction into the detection framework to deal with pedestrians with different degree of occlusion. The extensive experiments demonstrate that our VRGNet achieves a leading performance under different evaluation settings on Caltech-USA dataset, especially for occluded pedestrians. In addition, it also achieves a competitive of 48.4%, 9.3%, 6.7% under the Heavy, Partial and Bare settings respectively on CityPersons dataset compared with other state-of-the-art pedestrian detection algorithms, while keeping a better speed-accuracy trade-off.
{"title":"VRGNet: A Robust Visible Region-Guided Network for Occluded Pedestrian Detection","authors":"Xin Mao, Chaoqi Yan, Hong Zhang, J. Song, Ding Yuan","doi":"10.1145/3581807.3581817","DOIUrl":"https://doi.org/10.1145/3581807.3581817","url":null,"abstract":"Pedestrian detection has made significant progress in both academic and industrial fields. However, there are still some challenging questions with regard to occlusion scene. In this paper, we propose a novel and robust visible region-guided network (VRGNet) to specially improve the occluded pedestrian detection performance. Specifically, we leverage the adapted FPN-based framework to extract multi-scale features, and fuse them together to encode more precision localization and semantic information. In addition, we construct a pedestrian part pool that covers almost all the scale of different occluded body regions. Meanwhile, we propose a new occlusion handling strategy by elaborately integrating the prior knowledge of different visible body regions with visibility prediction into the detection framework to deal with pedestrians with different degree of occlusion. The extensive experiments demonstrate that our VRGNet achieves a leading performance under different evaluation settings on Caltech-USA dataset, especially for occluded pedestrians. In addition, it also achieves a competitive of 48.4%, 9.3%, 6.7% under the Heavy, Partial and Bare settings respectively on CityPersons dataset compared with other state-of-the-art pedestrian detection algorithms, while keeping a better speed-accuracy trade-off.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123704960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}