Scene text recognition (STR) has attracted extensive attention in pattern recognition community. With the development of deep learning, the object detection and sequence recognition schemes based on deep neural networks have been widely used in this task. Crucially, the discriminative features play a vital role in complex scene text backgrounds. However, for specific tasks, inappropriate pooling strategies may lose feature details. To tackle this problem, in this paper, an end-to-end based on adaptive importance pooling network (AIPN) is proposed. Concretely, we embed the novel AIP strategy into feature extraction stage. Additionally, we adopt the attention-based LSTM as decoder so that the useful image feature information regions are automatically focused while predicting final recognition results. Furthermore, to reduce the burden of feature representation for the next recognition, text rectification network (TRN) supervised by text recognition parts is utilized to normalize the input text images. Experimental results show that our model achieves inspiring performances on STR benchmark datasets IIIT5K, SVT, ICDAR-2003 and ICDAR-2013.
{"title":"Adaptive Importance Pooling Network for Scene Text Recognition","authors":"Peng Ren, Qingsong Yu, Xuanqi Wu, Ziyang Wang","doi":"10.1145/3404555.3404614","DOIUrl":"https://doi.org/10.1145/3404555.3404614","url":null,"abstract":"Scene text recognition (STR) has attracted extensive attention in pattern recognition community. With the development of deep learning, the object detection and sequence recognition schemes based on deep neural networks have been widely used in this task. Crucially, the discriminative features play a vital role in complex scene text backgrounds. However, for specific tasks, inappropriate pooling strategies may lose feature details. To tackle this problem, in this paper, an end-to-end based on adaptive importance pooling network (AIPN) is proposed. Concretely, we embed the novel AIP strategy into feature extraction stage. Additionally, we adopt the attention-based LSTM as decoder so that the useful image feature information regions are automatically focused while predicting final recognition results. Furthermore, to reduce the burden of feature representation for the next recognition, text rectification network (TRN) supervised by text recognition parts is utilized to normalize the input text images. Experimental results show that our model achieves inspiring performances on STR benchmark datasets IIIT5K, SVT, ICDAR-2003 and ICDAR-2013.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"120 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120820649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingze Mu, Chao-Yi Dong, Qi-Ming Chen, Bochen Li, Zhi-Qiang Fan
The realization of mobile robots' autonomous positioning and map constructing in unknown environments is crucial for the robots' obstacle avoidance and path planning. In this paper, an improved ORB (Oriented fast and Rotated Brief)-SLAM2 (Simultaneous Localization And Mapping 2) algorithm is used to construct a 3D (Three Dimensional) point cloud map of the robot's own positioning and environment. The improved ORB-SLAM2 algorithm is schemed as follows: firstly, after the environment map constructions, it adds the function of saving maps to help implementing map type conversion and navigation obstacle avoidance. Then we employ a PCL (Point Cloud Library) to convert the saved 3D point cloud map into an octomap. A path planning algorithm for mobile robots is implemented on the basis of the octomaps. The robot's dynamical global path planning is implemented using a RRT (Rapidly-exploring Random Tree) algorithm. The experimental results of map constructing and path planning show that the scheme proposed in this paper can effectively realize the obstacle avoidance and path planning of the mobile robot. Thus, the algorithm provides a basis for the further realizing the mobile robot' autonomous movement.
移动机器人在未知环境下的自主定位和地图构建的实现对于机器人的避障和路径规划至关重要。本文采用改进的ORB (Oriented fast and rotating Brief)-SLAM2 (Simultaneous Localization and Mapping 2)算法,构建机器人自身定位和环境的三维点云图。改进的ORB-SLAM2算法方案如下:首先,在环境地图构建完成后,增加地图保存功能,实现地图类型转换和导航避障;然后我们使用PCL(点云库)将保存的3D点云图转换为八坐标图。提出了一种基于八元地图的移动机器人路径规划算法。机器人的动态全局路径规划采用RRT(快速探索随机树)算法。地图生成和路径规划的实验结果表明,本文提出的方案可以有效地实现移动机器人的避障和路径规划。从而为进一步实现移动机器人的自主运动提供了基础。
{"title":"Research on Navigation and Path Planning of Mobile Robot Based on Vision Sensor","authors":"Yingze Mu, Chao-Yi Dong, Qi-Ming Chen, Bochen Li, Zhi-Qiang Fan","doi":"10.1145/3404555.3404589","DOIUrl":"https://doi.org/10.1145/3404555.3404589","url":null,"abstract":"The realization of mobile robots' autonomous positioning and map constructing in unknown environments is crucial for the robots' obstacle avoidance and path planning. In this paper, an improved ORB (Oriented fast and Rotated Brief)-SLAM2 (Simultaneous Localization And Mapping 2) algorithm is used to construct a 3D (Three Dimensional) point cloud map of the robot's own positioning and environment. The improved ORB-SLAM2 algorithm is schemed as follows: firstly, after the environment map constructions, it adds the function of saving maps to help implementing map type conversion and navigation obstacle avoidance. Then we employ a PCL (Point Cloud Library) to convert the saved 3D point cloud map into an octomap. A path planning algorithm for mobile robots is implemented on the basis of the octomaps. The robot's dynamical global path planning is implemented using a RRT (Rapidly-exploring Random Tree) algorithm. The experimental results of map constructing and path planning show that the scheme proposed in this paper can effectively realize the obstacle avoidance and path planning of the mobile robot. Thus, the algorithm provides a basis for the further realizing the mobile robot' autonomous movement.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Geng, Changshun Yin, Zhitao Xiao, Fang Zhang, Jun Wu
In order to accurately measure the deviation between car seat cutting pieces and CAD templates, and then evaluate the production quality of car seat cutting pieces, this paper proposes a matching algorithm of car seat cutting pieces and CAD based on feature retrieval and shape segmentation. The processing object of this algorithm is the cutting piece images collected by the acquisition system that combines the backlight board and CCD camera. Firstly, according to the geometric characteristics of CAD, a CAD retrieval method based on image edge shape features was proposed. Then, in view of the flexible characteristics of car seat cutting piece, a matching algorithm of car seat cutting piece and CAD based on shape segmentation was proposed. Finally, the coordinate system of the cutting piece and CAD is unified by affine transformation, and the deviation between the two is calculated. A large number of experiments are performed in a field of view of 700x 500mm, and the results show that the method proposed in this paper can effectively improve the matching accuracy of the cutting piece and CAD. Experimental results verify the effectiveness of the proposed method.
{"title":"Cutting Piece and CAD Matching Method Based on Feature Retrieval and Shape Segmentation","authors":"Lei Geng, Changshun Yin, Zhitao Xiao, Fang Zhang, Jun Wu","doi":"10.1145/3404555.3404611","DOIUrl":"https://doi.org/10.1145/3404555.3404611","url":null,"abstract":"In order to accurately measure the deviation between car seat cutting pieces and CAD templates, and then evaluate the production quality of car seat cutting pieces, this paper proposes a matching algorithm of car seat cutting pieces and CAD based on feature retrieval and shape segmentation. The processing object of this algorithm is the cutting piece images collected by the acquisition system that combines the backlight board and CCD camera. Firstly, according to the geometric characteristics of CAD, a CAD retrieval method based on image edge shape features was proposed. Then, in view of the flexible characteristics of car seat cutting piece, a matching algorithm of car seat cutting piece and CAD based on shape segmentation was proposed. Finally, the coordinate system of the cutting piece and CAD is unified by affine transformation, and the deviation between the two is calculated. A large number of experiments are performed in a field of view of 700x 500mm, and the results show that the method proposed in this paper can effectively improve the matching accuracy of the cutting piece and CAD. Experimental results verify the effectiveness of the proposed method.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129007948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid development of the Internet, document data have become an important source of information in the financial field. The application of documents sentiment analysis in the financial field has attracted increasing attention. It is obviously impractical to extract sentiments manually from a large amount of financial document, but natural language processing (NLP) technology can solve this problem. The research object of this paper focuses on the research reports of listed companies, which is a kind of long financial document published by experts in the field. In this paper, we propose a hierarchical label embedding neural network model for sentiment analysis of financial documents. This model adopts hierarchical network structure to capture the structural information of financial documents. Moreover, the model also includes an expression embedding mechanism for focusing on important content. We believe that most of the words and sentences in a document are consistent with the sentiments of the labels marked by the author. The label embedding mechanism can pay more attention to the content that is consistent with the sentiments of the labels during the document's hierarchical representation. Experiments showed that our method is more effective than other advanced methods on the established dataset.
{"title":"Hierarchical Label Embedding Networks for Financial Document Sentiment Analysis","authors":"Ping Yao, Qinke Peng, Tian Han","doi":"10.1145/3404555.3404583","DOIUrl":"https://doi.org/10.1145/3404555.3404583","url":null,"abstract":"With the rapid development of the Internet, document data have become an important source of information in the financial field. The application of documents sentiment analysis in the financial field has attracted increasing attention. It is obviously impractical to extract sentiments manually from a large amount of financial document, but natural language processing (NLP) technology can solve this problem. The research object of this paper focuses on the research reports of listed companies, which is a kind of long financial document published by experts in the field. In this paper, we propose a hierarchical label embedding neural network model for sentiment analysis of financial documents. This model adopts hierarchical network structure to capture the structural information of financial documents. Moreover, the model also includes an expression embedding mechanism for focusing on important content. We believe that most of the words and sentences in a document are consistent with the sentiments of the labels marked by the author. The label embedding mechanism can pay more attention to the content that is consistent with the sentiments of the labels during the document's hierarchical representation. Experiments showed that our method is more effective than other advanced methods on the established dataset.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lung cancer has always threatened people's health and life. Lung nodules, as early features of lung cancer, have very important clinical significance and research value for the diagnosis of lung cancer. The features captured by the traditional convolutional neural network are limited, in addition, traditional YOLO method has the problems of low accuracy and inaccurate positioning. Aiming at this problem, this paper proposes a new algorithm based on YOLOv3 for detecting lung nodules. The Inception ResBlocks are added to the feature network of YOLOv3, so that the network can extract richer feature information, furthermore, a new bounding box regression loss function is proposed. The loss function GDIoU loss makes the prediction of bounding box regression more accurate and further improves the performance of lung nodule detection. After experimental verification, the AP of this model can reach 83.5%, and the sensitivity can reach 92.6%. The proposed method has a good performance in terms of positioning accuracy and detection rate, and can avoid the problems of false detection and missed detection to a certain extent. It provides a new idea for the detection of lung nodules.
{"title":"A New Object Detection Algorithm Based on YOLOv3 for Lung Nodules","authors":"Kejia Xu, Hong Jiang, Wen-Gen Tang","doi":"10.1145/3404555.3404609","DOIUrl":"https://doi.org/10.1145/3404555.3404609","url":null,"abstract":"Lung cancer has always threatened people's health and life. Lung nodules, as early features of lung cancer, have very important clinical significance and research value for the diagnosis of lung cancer. The features captured by the traditional convolutional neural network are limited, in addition, traditional YOLO method has the problems of low accuracy and inaccurate positioning. Aiming at this problem, this paper proposes a new algorithm based on YOLOv3 for detecting lung nodules. The Inception ResBlocks are added to the feature network of YOLOv3, so that the network can extract richer feature information, furthermore, a new bounding box regression loss function is proposed. The loss function GDIoU loss makes the prediction of bounding box regression more accurate and further improves the performance of lung nodule detection. After experimental verification, the AP of this model can reach 83.5%, and the sensitivity can reach 92.6%. The proposed method has a good performance in terms of positioning accuracy and detection rate, and can avoid the problems of false detection and missed detection to a certain extent. It provides a new idea for the detection of lung nodules.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132862774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","authors":"","doi":"10.1145/3404555","DOIUrl":"https://doi.org/10.1145/3404555","url":null,"abstract":"","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133198979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A rapidly growing population presents many challenges to healthcare and security surveillance around the world. Human activity recognition is one of the active research areas to recognizing and understanding the various activities. Many researchers are finding and representing the details of human body gestures to determine human activity or action. The result, however, is still unsatisfactory due to the inclusion of irrelevant images. The model is rather rudimentary and it does not specific enough for representing the meaning of images. In this paper, we propose a methodology for human activities of daily living recognition with 4 steps (1) processes including text-based embedding concept, (2) semi-supervised graph node, (3) graph convolution network, and (4) measurement and evaluation. The experimental results indicate that our proposed approach offers significant performance improvements in data set 2 in 10-fold, with the maximum of 79.34%.
{"title":"Human Activities of Daily Living Recognition with Graph Convolutional Network","authors":"N. Chinpanthana, Yunyu Liu","doi":"10.1145/3404555.3404557","DOIUrl":"https://doi.org/10.1145/3404555.3404557","url":null,"abstract":"A rapidly growing population presents many challenges to healthcare and security surveillance around the world. Human activity recognition is one of the active research areas to recognizing and understanding the various activities. Many researchers are finding and representing the details of human body gestures to determine human activity or action. The result, however, is still unsatisfactory due to the inclusion of irrelevant images. The model is rather rudimentary and it does not specific enough for representing the meaning of images. In this paper, we propose a methodology for human activities of daily living recognition with 4 steps (1) processes including text-based embedding concept, (2) semi-supervised graph node, (3) graph convolution network, and (4) measurement and evaluation. The experimental results indicate that our proposed approach offers significant performance improvements in data set 2 in 10-fold, with the maximum of 79.34%.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"158 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128891185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time and reliable traffic flow estimation is the basis of urban traffic management and control. However, the existing research focuses on how to use the historical data of surveillance intersection to predict future traffic conditions. As we know, there are few effective algorithms to infer the real-time traffic state of non-surveillance intersections from limited road surveillance by using traffic information in the urban road system. In this paper, we introduce a new solution to solve the prediction task of traffic flow analysis by using traffic data, especially taxi historical data, traffic network data and intersection historical data. The proposed solution takes advantage of GCN and CGAN, and we improved the Unet to realize an important part of the generator. Then, we capture the relationship between the intersections with surveillance and the intersections without surveillance by floating taxi-cabs covered in the whole city. The framework of CGAN can adjust the weights and enhance the inference ability to generate complete traffic status under current conditions. The experimental results show that our method is superior to other methods on the accuracy of traffic volume inference.
{"title":"Traffic Condition Prediction of Urban Roads Based on Neural Network","authors":"Ruyi Zhu","doi":"10.1145/3404555.3404621","DOIUrl":"https://doi.org/10.1145/3404555.3404621","url":null,"abstract":"Real-time and reliable traffic flow estimation is the basis of urban traffic management and control. However, the existing research focuses on how to use the historical data of surveillance intersection to predict future traffic conditions. As we know, there are few effective algorithms to infer the real-time traffic state of non-surveillance intersections from limited road surveillance by using traffic information in the urban road system. In this paper, we introduce a new solution to solve the prediction task of traffic flow analysis by using traffic data, especially taxi historical data, traffic network data and intersection historical data. The proposed solution takes advantage of GCN and CGAN, and we improved the Unet to realize an important part of the generator. Then, we capture the relationship between the intersections with surveillance and the intersections without surveillance by floating taxi-cabs covered in the whole city. The framework of CGAN can adjust the weights and enhance the inference ability to generate complete traffic status under current conditions. The experimental results show that our method is superior to other methods on the accuracy of traffic volume inference.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124137177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, the world economy is developing rapidly, new Internet companies are emerging. Based on the consideration of effectively improving the working efficiency of employees, enhance the competitiveness of enterprises, and facilitate managers to grasp the working status of employees at any time, this paper proposed a face recognition method for enterprise positions based on a convolutional neural network (CNN) optimization algorithm. At first, this paper established enterprise employee face classification model based on the TensorFlow deep learning framework, then used convolutional neural network to extract employee face image features, and introduced Keras deep learning library to train face recognition model, finally used TensorFlow-supported momentum gradient descent optimization method to effectively optimize the CNN model and used the loss function to effectively evaluate the performance of the model, thereby effectively improving the recognition accuracy of the face recognition algorithm. The algorithm proposed in this paper is used to identify the working status of employees in practice. The validity of the algorithm is verified by questionnaire results, and compared with typical face recognition algorithms. The experiment results clearly show that to some extent the method we proposed has higher recognition accuracy and better practicality, which will help companies find out the working status of their employees.
{"title":"Face Recognition Method for Enterprise Workstations Based on Convolutional Neural Network Optimization Algorithm","authors":"Naiyuan Tian, Xiangyun Zhang, Tian Liu, Chen-Xia Zhao","doi":"10.1145/3404555.3404585","DOIUrl":"https://doi.org/10.1145/3404555.3404585","url":null,"abstract":"Nowadays, the world economy is developing rapidly, new Internet companies are emerging. Based on the consideration of effectively improving the working efficiency of employees, enhance the competitiveness of enterprises, and facilitate managers to grasp the working status of employees at any time, this paper proposed a face recognition method for enterprise positions based on a convolutional neural network (CNN) optimization algorithm. At first, this paper established enterprise employee face classification model based on the TensorFlow deep learning framework, then used convolutional neural network to extract employee face image features, and introduced Keras deep learning library to train face recognition model, finally used TensorFlow-supported momentum gradient descent optimization method to effectively optimize the CNN model and used the loss function to effectively evaluate the performance of the model, thereby effectively improving the recognition accuracy of the face recognition algorithm. The algorithm proposed in this paper is used to identify the working status of employees in practice. The validity of the algorithm is verified by questionnaire results, and compared with typical face recognition algorithms. The experiment results clearly show that to some extent the method we proposed has higher recognition accuracy and better practicality, which will help companies find out the working status of their employees.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122185010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziya Yu, Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan
In recent years, with the development of deep learning, the integration between neuroscience and computer vision has been deepened. In computer vision, it has been possible to generate images from text as well as semantic understanding from images based on deep learning. Here, text refers to human language, and the language that a computer can understand typically requires text to be encoded. In human brain visual expression, it also produces "descriptions" of visual stimuli, that is, the "language" that generates from the brain itself. Reconstruction of visual information is the process of reconstructing visual stimuli from the understanding of human brain, which is the most difficult to achieve in visual decoding. And based on the existing research of visual mechanisms, it is still difficult to understand the "language" of human brain. Inspired by generating images from text, we regarded voxel responses as the "language" of brain in order to reconstruct visual stimuli and built an end-to-end visual decoding model under the condition of small number of samples. We simply retrained a generative adversarial network (GAN) used to generate images from text on 1200 training data (including natural image stimuli and corresponding voxel responses). We regarded voxel responses as semantic information of brain, and sent them to GAN as prior information. The results showed that the decoding model we trained can reconstruct the natural images successfully. It also suggested the feasibility of reconstructing visual stimuli from "brain language", and the end-to-end model was more likely to learn the direct mapping between brain activity and visual perception. Moreover, it further indicated the great potential of combining neuroscience and computer vision.
{"title":"End-to-End Image Reconstruction of Image from Human Functional Magnetic Resonance Imaging Based on the \"Language\" of Visual Cortex","authors":"Ziya Yu, Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan","doi":"10.1145/3404555.3404593","DOIUrl":"https://doi.org/10.1145/3404555.3404593","url":null,"abstract":"In recent years, with the development of deep learning, the integration between neuroscience and computer vision has been deepened. In computer vision, it has been possible to generate images from text as well as semantic understanding from images based on deep learning. Here, text refers to human language, and the language that a computer can understand typically requires text to be encoded. In human brain visual expression, it also produces \"descriptions\" of visual stimuli, that is, the \"language\" that generates from the brain itself. Reconstruction of visual information is the process of reconstructing visual stimuli from the understanding of human brain, which is the most difficult to achieve in visual decoding. And based on the existing research of visual mechanisms, it is still difficult to understand the \"language\" of human brain. Inspired by generating images from text, we regarded voxel responses as the \"language\" of brain in order to reconstruct visual stimuli and built an end-to-end visual decoding model under the condition of small number of samples. We simply retrained a generative adversarial network (GAN) used to generate images from text on 1200 training data (including natural image stimuli and corresponding voxel responses). We regarded voxel responses as semantic information of brain, and sent them to GAN as prior information. The results showed that the decoding model we trained can reconstruct the natural images successfully. It also suggested the feasibility of reconstructing visual stimuli from \"brain language\", and the end-to-end model was more likely to learn the direct mapping between brain activity and visual perception. Moreover, it further indicated the great potential of combining neuroscience and computer vision.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"7 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120836180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}