{"title":"Off-topic Detection Model based on Biterm-LDA and Doc2vec","authors":"Pan Liu, Jie Liu, Xiaoli Ma, Jianshe Zhou","doi":"10.1145/3341069.3342989","DOIUrl":null,"url":null,"abstract":"Chinese writing in primary and secondary schools occupies an extremely important position in Chinese education. With the advent of natural language processing, the automatic e ssay review system has gradually matured, which has greatly promoted the development of composition writing. Especially the off-topic detection plays a key role in the automatic essay review system. We propose effective methods for off-topic detection. Firstly, we use Biterm-LDA combined with Doc2vec to inspect the topic and semantics of composition. Secondly, we propose a threshold calculation method based on the topic composition class center under different topic compositions. Finally, the ROC curve is employed to find the optimal threshold for each type of topic composition, then according to the optimal threshold, the off topic essay is judged. Experiments of the five types of topic composition show the average F1-score value of the off-topic detection reach about 65%.","PeriodicalId":411198,"journal":{"name":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341069.3342989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Chinese writing in primary and secondary schools occupies an extremely important position in Chinese education. With the advent of natural language processing, the automatic e ssay review system has gradually matured, which has greatly promoted the development of composition writing. Especially the off-topic detection plays a key role in the automatic essay review system. We propose effective methods for off-topic detection. Firstly, we use Biterm-LDA combined with Doc2vec to inspect the topic and semantics of composition. Secondly, we propose a threshold calculation method based on the topic composition class center under different topic compositions. Finally, the ROC curve is employed to find the optimal threshold for each type of topic composition, then according to the optimal threshold, the off topic essay is judged. Experiments of the five types of topic composition show the average F1-score value of the off-topic detection reach about 65%.