异构监督主题模型

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Transactions of the Association for Computational Linguistics Pub Date : 2022-06-01 DOI:10.1162/tacl_a_00487

Dhanya Sridhar, Hal Daumé, D. Blei

{"title":"异构监督主题模型","authors":"Dhanya Sridhar, Hal Daumé, D. Blei","doi":"10.1162/tacl_a_00487","DOIUrl":null,"url":null,"abstract":"Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"732-745"},"PeriodicalIF":4.2000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Heterogeneous Supervised Topic Models\",\"authors\":\"Dhanya Sridhar, Hal Daumé, D. Blei\",\"doi\":\"10.1162/tacl_a_00487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.\",\"PeriodicalId\":33559,\"journal\":{\"name\":\"Transactions of the Association for Computational Linguistics\",\"volume\":\"10 1\",\"pages\":\"732-745\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of the Association for Computational Linguistics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1162/tacl_a_00487\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00487","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 4

摘要

社会科学领域的研究人员经常对文本和感兴趣的结果之间的关系感兴趣，其目标是发现文本中的潜在模式并预测未见文本的结果。为此，本文开发了异构监督主题模型(HSTM)，这是一种用于文本分析和预测的概率方法。hstm假定文本和结果的联合模型，以发现有助于文本分析和预测的异构模式。hstm的主要优点是它们捕获了文本和潜在主题之间结果之间关系的异质性。为了适应hstm，我们开发了一种基于自编码变分贝叶斯框架的变分推理算法。我们研究了hstm在8个数据集上的性能，发现它们始终优于相关方法，包括微调黑盒模型。最后，我们运用hstm来分析带有亲调或反调标记的新闻文章。我们发现有证据表明，不同的语言用来表示赞成和反对的语气。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Heterogeneous Supervised Topic Models

Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.