FastClass:一种时间效率的弱监督文本分类方法

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2022-12-11 DOI:10.48550/arXiv.2212.05506

Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang

{"title":"FastClass:一种时间效率的弱监督文本分类方法","authors":"Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang","doi":"10.48550/arXiv.2212.05506","DOIUrl":null,"url":null,"abstract":"Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"14 1","pages":"4746-4758"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification\",\"authors\":\"Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang\",\"doi\":\"10.48550/arXiv.2212.05506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.\",\"PeriodicalId\":74540,\"journal\":{\"name\":\"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing\",\"volume\":\"14 1\",\"pages\":\"4746-4758\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.05506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.05506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

弱监督文本分类旨在仅使用类描述和未标记的数据来训练分类器。最近的研究表明，关键字驱动的方法可以在各种任务上达到最先进的性能。然而，这些方法不仅依赖于精心设计的类描述来获得特定于类的关键字，而且还需要大量未标记的数据，并且需要很长时间来训练。本文提出了一种高效的弱监督分类方法FastClass。它使用密集文本表示从外部未标记语料库中检索类相关文档，并选择最优子集来训练分类器。与关键字驱动的方法相比，我们的方法较少依赖于初始类描述，因为它不再需要将每个类描述扩展为一组特定于类的关键字。在广泛的分类任务上的实验表明，所提出的方法在分类精度方面经常优于关键词驱动模型，并且通常具有数量级的训练速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量