{"title":"Duplicate Question Management and Answer Verification System","authors":"Somak Mukherjee, N. S. Kumar","doi":"10.1109/T4E.2019.00067","DOIUrl":null,"url":null,"abstract":"Management of large data sets of question papers can be cumbersome, especially when dealing with potential duplicate or erroneous questions. The addition of a natural language system that automatically handles these issues would greatly speed up the verification of such data sets. This is a tool for identifying semantic similarity between sentences in plain-text English. Handpicked features were selected which included simple structural features and word embedding features using word2vec with multiple distance metrics between the resulting sentence vectors. The model is trained on weak hardware allowing for sufficiently high accuracy on low end machines. Results demonstrate the effectiveness of boosting for improving the performance of simple learning models, allowing for complex learning in the absence of high end hardware.","PeriodicalId":347086,"journal":{"name":"2019 IEEE Tenth International Conference on Technology for Education (T4E)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Tenth International Conference on Technology for Education (T4E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/T4E.2019.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Management of large data sets of question papers can be cumbersome, especially when dealing with potential duplicate or erroneous questions. The addition of a natural language system that automatically handles these issues would greatly speed up the verification of such data sets. This is a tool for identifying semantic similarity between sentences in plain-text English. Handpicked features were selected which included simple structural features and word embedding features using word2vec with multiple distance metrics between the resulting sentence vectors. The model is trained on weak hardware allowing for sufficiently high accuracy on low end machines. Results demonstrate the effectiveness of boosting for improving the performance of simple learning models, allowing for complex learning in the absence of high end hardware.