Topic Extraction and Classification for Questions Posted in Community-Based Question Answering Services

2019 International Conference on Computational Science and Computational Intelligence (CSCI) Pub Date : 2019-12-01 DOI:10.1109/CSCI49370.2019.00253

Q. Ma, M. Murata

引用次数: 1

Abstract

This paper presents methods of simultaneously performing topic/keyword extraction and unsupervised classification for questions posted in community-based question answering services (CQA) or Q&A websites, using topic models and hybrid models. Large-scale experiments on two kinds of data, one called category data and the other called subtyping data, show the effectiveness of our methods. The purity and correct rate show that the topic models outperform clustering methods, hybrid models outperform topic models in question classification, and the adoption of term frequency-inverse document frequency is effective for the subtyping data. Manual evaluations with the extracted keywords show the effectiveness of the topic models in topic extraction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

社区问答服务中问题的主题提取与分类

本文提出了利用主题模型和混合模型对基于社区的问答服务(CQA)或问答网站上发布的问题同时进行主题/关键词提取和无监督分类的方法。在两种数据上的大规模实验，一种是类别数据，另一种是亚型数据，表明了我们的方法的有效性。纯度和正确率表明，主题模型在问题分类方面优于聚类方法，混合模型在问题分类方面优于主题模型，采用词频-逆文档频率对子类型数据是有效的。用提取的关键词进行人工评价，表明了主题模型在主题提取中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Computational Science and Computational Intelligence (CSCI)

自引率

0.00%

发文量