Aaron Erlich, S. G. Dantas, Benjamin E. Bagozzi, Daniel Berliner, Brian Palmer-Rubin
{"title":"政治文本即数据的多标签预测","authors":"Aaron Erlich, S. G. Dantas, Benjamin E. Bagozzi, Daniel Berliner, Brian Palmer-Rubin","doi":"10.1017/pan.2021.15","DOIUrl":null,"url":null,"abstract":"Abstract Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one’s multiple labels are low.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":"30 1","pages":"463 - 480"},"PeriodicalIF":4.7000,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/pan.2021.15","citationCount":"6","resultStr":"{\"title\":\"Multi-Label Prediction for Political Text-as-Data\",\"authors\":\"Aaron Erlich, S. G. Dantas, Benjamin E. Bagozzi, Daniel Berliner, Brian Palmer-Rubin\",\"doi\":\"10.1017/pan.2021.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one’s multiple labels are low.\",\"PeriodicalId\":48270,\"journal\":{\"name\":\"Political Analysis\",\"volume\":\"30 1\",\"pages\":\"463 - 480\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2021-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1017/pan.2021.15\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Political Analysis\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1017/pan.2021.15\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"POLITICAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.15","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
Abstract Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one’s multiple labels are low.
期刊介绍:
Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.