From Small-scale to Large-scale Text Classification

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313563

Kang-Min Kim, Yeachan Kim, Jungho Lee, Ji-Min Lee, SangKeun Lee

{"title":"From Small-scale to Large-scale Text Classification","authors":"Kang-Min Kim, Yeachan Kim, Jungho Lee, Ji-Min Lee, SangKeun Lee","doi":"10.1145/3308558.3313563","DOIUrl":null,"url":null,"abstract":"Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从小规模到大规模文本分类

神经网络模型在文本分类领域取得了令人瞩目的成绩。然而，在涉及大量类别(例如数千个类别)的大规模文本分类中，现有方法往往存在训练数据不足的问题。一些神经网络模型利用多任务学习来克服训练数据量有限的问题。然而，这些方法也局限于小规模文本分类。本文提出了一种新的基于神经网络的多任务学习框架，用于大规模文本分类。为此，我们首先将文本分类的不同尺度(即大类和小大类)视为多个相关的任务。然后，我们训练所提出的神经网络，它可以同时学习小型和大规模的文本分类任务。特别是，我们通过使用gate机制进一步增强了这种多任务学习架构，该机制控制了小型和大型文本分类任务之间的特征流。实验结果清楚地表明，我们提出的模型在小规模文本分类任务的帮助下提高了大规模文本分类任务的性能。与最先进的技术相比，所提出的方案在微观平均和宏观平均f1得分方面分别表现出高达14%和5%的显著改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量

期刊最新文献

Decoupled Smoothing on Graphs Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis Augmenting Knowledge Tracing by Considering Forgetting Behavior Enhancing Fashion Recommendation with Visual Compatibility Relationship Judging a Book by Its Cover: The Effect of Facial Perception on Centrality in Social Networks