{"title":"使用分词与支持向量机评估小学生泰文可读性","authors":"Patcharanut Daowadung, Yaw-Huei Chen","doi":"10.1109/JCSSE.2011.5930115","DOIUrl":null,"url":null,"abstract":"This research aims to develop a readability assessment technique to find appropriate Thai language reading materials for primary school students. The corpus contains 1050 articles from textbooks used by students from grade 1 to grade 6. We preprocess the articles by Ling CD program for Thai word segmentation and use mutual information (MI) to select the most important terms in the corpus. Term frequency and inverse document frequency (TF-IDF) are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that the proposed method can reach 0.83 F-measure for identifying articles suitable for middle grades primary school students.","PeriodicalId":287775,"journal":{"name":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Using word segmentation and SVM to assess readability of Thai text for primary school students\",\"authors\":\"Patcharanut Daowadung, Yaw-Huei Chen\",\"doi\":\"10.1109/JCSSE.2011.5930115\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research aims to develop a readability assessment technique to find appropriate Thai language reading materials for primary school students. The corpus contains 1050 articles from textbooks used by students from grade 1 to grade 6. We preprocess the articles by Ling CD program for Thai word segmentation and use mutual information (MI) to select the most important terms in the corpus. Term frequency and inverse document frequency (TF-IDF) are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that the proposed method can reach 0.83 F-measure for identifying articles suitable for middle grades primary school students.\",\"PeriodicalId\":287775,\"journal\":{\"name\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2011.5930115\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2011.5930115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using word segmentation and SVM to assess readability of Thai text for primary school students
This research aims to develop a readability assessment technique to find appropriate Thai language reading materials for primary school students. The corpus contains 1050 articles from textbooks used by students from grade 1 to grade 6. We preprocess the articles by Ling CD program for Thai word segmentation and use mutual information (MI) to select the most important terms in the corpus. Term frequency and inverse document frequency (TF-IDF) are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that the proposed method can reach 0.83 F-measure for identifying articles suitable for middle grades primary school students.