{"title":"Topic Extraction and Classification for Questions Posted in Community-Based Question Answering Services","authors":"Q. Ma, M. Murata","doi":"10.1109/CSCI49370.2019.00253","DOIUrl":null,"url":null,"abstract":"This paper presents methods of simultaneously performing topic/keyword extraction and unsupervised classification for questions posted in community-based question answering services (CQA) or Q&A websites, using topic models and hybrid models. Large-scale experiments on two kinds of data, one called category data and the other called subtyping data, show the effectiveness of our methods. The purity and correct rate show that the topic models outperform clustering methods, hybrid models outperform topic models in question classification, and the adoption of term frequency-inverse document frequency is effective for the subtyping data. Manual evaluations with the extracted keywords show the effectiveness of the topic models in topic extraction.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI49370.2019.00253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents methods of simultaneously performing topic/keyword extraction and unsupervised classification for questions posted in community-based question answering services (CQA) or Q&A websites, using topic models and hybrid models. Large-scale experiments on two kinds of data, one called category data and the other called subtyping data, show the effectiveness of our methods. The purity and correct rate show that the topic models outperform clustering methods, hybrid models outperform topic models in question classification, and the adoption of term frequency-inverse document frequency is effective for the subtyping data. Manual evaluations with the extracted keywords show the effectiveness of the topic models in topic extraction.