{"title":"Performance evaluation of character-level CNNs using tweet data and analysis for weight perturbations","authors":"Kazuteru Miyazaki, Masaaki Ida","doi":"10.1007/s10015-024-00944-9","DOIUrl":null,"url":null,"abstract":"<div><p>Character-level convolutional neural networks (CLCNNs) are commonly used to classify textual data. CLCNN is used as a more versatile tool. For natural language recognition, after decomposing a sentence into character units, each unit is converted into a corresponding character code (e.g., Unicode values) and the code is input into the CLCNN network. Thus, sentences can be treated like images. We have previously applied a CLCNN to verify whether a university’s diploma and/or curriculum policies are well written. In this study, we experimentally confirm the effectiveness of CLCNN using tweet data. In particular, we focus on the effect of the number of units on performance using the following two types of data; one is a real and public tweet dataset on the reputation of a cell phone, and the other is the NTCIR-13 MedWeb task, which consists of pseudo-tweet data and is a well-known collection of tests for multi-label problems. Results of experiments conducted by varying the number of units in the all-coupled layer confirmed the agreement of the results with the theorem introduced in the Amari’s book (Amari in Mathematical Science New Development of Information Geometry, For Senior & Graduate Courses. SAIENSU-SHA Co., 2014). Furthermore, in the NTCIR-13 MedWeb task, we analyze two kinds of experiments, the effects of kernel size and weight perturbation. The results of the difference in the kernel size suggest the existence of an optimal kernel size for sentence comprehension. The results of perturbations to the convolutional layer and pooling layer indicate the possibility of relationship between the numbers of degrees of freedom and network parameters.</p></div>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-024-00944-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Character-level convolutional neural networks (CLCNNs) are commonly used to classify textual data. CLCNN is used as a more versatile tool. For natural language recognition, after decomposing a sentence into character units, each unit is converted into a corresponding character code (e.g., Unicode values) and the code is input into the CLCNN network. Thus, sentences can be treated like images. We have previously applied a CLCNN to verify whether a university’s diploma and/or curriculum policies are well written. In this study, we experimentally confirm the effectiveness of CLCNN using tweet data. In particular, we focus on the effect of the number of units on performance using the following two types of data; one is a real and public tweet dataset on the reputation of a cell phone, and the other is the NTCIR-13 MedWeb task, which consists of pseudo-tweet data and is a well-known collection of tests for multi-label problems. Results of experiments conducted by varying the number of units in the all-coupled layer confirmed the agreement of the results with the theorem introduced in the Amari’s book (Amari in Mathematical Science New Development of Information Geometry, For Senior & Graduate Courses. SAIENSU-SHA Co., 2014). Furthermore, in the NTCIR-13 MedWeb task, we analyze two kinds of experiments, the effects of kernel size and weight perturbation. The results of the difference in the kernel size suggest the existence of an optimal kernel size for sentence comprehension. The results of perturbations to the convolutional layer and pooling layer indicate the possibility of relationship between the numbers of degrees of freedom and network parameters.