{"title":"Question Classification from Thai Sentences by Considering Word Context to Question Generation","authors":"Saranlita Chotirat, P. Meesad, H. Unger","doi":"10.1109/RI2C56397.2022.9910313","DOIUrl":null,"url":null,"abstract":"The potential of automated question generation is role play in the multi-fields and multi-applications such as question and answering systems, examination systems, and information retrieval. Before learning the question generated, one should understand how to classify questions. This research aims to generate possible questions considering the possible question categories from question classification based on Natural Language Processing. In this research, we compared the results on Logistic Regression, Support Vector Machine, and Multinomial Naï ve Bayes, which were traditional classification models. The deep learning techniques were Convolutional Neural Networks, Bidirectional Long Short-Term Memory, combined CNN and BiLSTM models, and BERT models. The experimental results show that the preprocessing phase using Natural Language Processing could enhance question classification. The classification of the sentence to question classification attained an average micro $F_{1} -$ score of 91.40% when applied BERT model by pre-trained WangchanBERTa on simple sentences. In contrast, the satisfying score with an average micro $F_{1} -$ score of 82.07% (from 80.37% on original input) when applied to add all POS tags unigram + bigram TF-IDF by using the SVM model. The experimental results when the CNN model with GloVe on adding focusing POS tags is a satisfactory score with an average micro $F_{1} -$ score of 79.79%.","PeriodicalId":403083,"journal":{"name":"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RI2C56397.2022.9910313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The potential of automated question generation is role play in the multi-fields and multi-applications such as question and answering systems, examination systems, and information retrieval. Before learning the question generated, one should understand how to classify questions. This research aims to generate possible questions considering the possible question categories from question classification based on Natural Language Processing. In this research, we compared the results on Logistic Regression, Support Vector Machine, and Multinomial Naï ve Bayes, which were traditional classification models. The deep learning techniques were Convolutional Neural Networks, Bidirectional Long Short-Term Memory, combined CNN and BiLSTM models, and BERT models. The experimental results show that the preprocessing phase using Natural Language Processing could enhance question classification. The classification of the sentence to question classification attained an average micro $F_{1} -$ score of 91.40% when applied BERT model by pre-trained WangchanBERTa on simple sentences. In contrast, the satisfying score with an average micro $F_{1} -$ score of 82.07% (from 80.37% on original input) when applied to add all POS tags unigram + bigram TF-IDF by using the SVM model. The experimental results when the CNN model with GloVe on adding focusing POS tags is a satisfactory score with an average micro $F_{1} -$ score of 79.79%.