A Transformer Model for Manifesto Classification Using Cross-Context Training: An Ecuadorian Case Study

IF 3 2区 社会学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Social Science Computer Review Pub Date : 2024-07-24 DOI:10.1177/08944393241266220
Fernanda Barzallo, Maria Baldeon-Calisto, Margorie Pérez, Maria Emilia Moscoso, Danny Navarrete, Daniel Riofrío, Pablo Medina-Peréz, Susana K Lai-Yuen, Diego Benítez, Noel Peréz, Ricardo Flores Moyano, Mateo Fierro
{"title":"A Transformer Model for Manifesto Classification Using Cross-Context Training: An Ecuadorian Case Study","authors":"Fernanda Barzallo, Maria Baldeon-Calisto, Margorie Pérez, Maria Emilia Moscoso, Danny Navarrete, Daniel Riofrío, Pablo Medina-Peréz, Susana K Lai-Yuen, Diego Benítez, Noel Peréz, Ricardo Flores Moyano, Mateo Fierro","doi":"10.1177/08944393241266220","DOIUrl":null,"url":null,"abstract":"Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"53 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393241266220","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用跨语境训练进行宣言分类的转换器模型:厄瓜多尔案例研究
要了解一个政党的政策和拟议行动,就必须对政治宣言进行内容分析。然而,手动标注政治文本既耗时又耗力。变压器网络已成为实现这一任务自动化的重要工具。然而,这些模型需要大量的数据集才能实现良好的性能。这在宣言分类中可能是一个限制,因为公开标注的数据集可能很少。为了应对这一挑战,在这项工作中,我们采用跨领域训练策略,开发了一种用于宣言分类的 Transformer 网络。利用比较宣言项目的数据库,我们实施了一个分数因子实验设计,以确定哪些西班牙文撰写的宣言是厄瓜多尔宣言标注的最佳训练集。此外,我们还统计分析了哪些 Transformer 架构和预处理操作可以提高模型的准确性。结果表明,创建一个包含西班牙和乌拉圭宣言的训练集,并实施词干化和词素化预处理操作,能产生最高的分类准确率。此外,我们还发现 DistilBERT 和 RoBERTa 变换器网络在宣言分类方面的表现在统计上相似且一致良好。使用跨语境训练策略,DistilBERT 和 RoBERTa 在厄瓜多尔宣言的分类中分别达到了 60.05% 和 57.64% 的准确率。最后,我们研究了训练集的组成对性能的影响。实验表明,仅使用厄瓜多尔宣言对 DistilBERT 进行训练可获得最高的准确率和 F1 分数。此外,在没有厄瓜多尔数据集的情况下,使用西班牙和乌拉圭的数据集对该模型进行训练,也能获得具有竞争力的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Social Science Computer Review
Social Science Computer Review 社会科学-计算机:跨学科应用
CiteScore
9.00
自引率
4.90%
发文量
95
审稿时长
>12 weeks
期刊介绍: Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.
期刊最新文献
Has ChatGPT Disrupted the Education Sector in the U.S.? The Moderating Role of Self-Esteem in the Relationship Between Social Media Use and Life Satisfaction Among Older Adults Feminist Identity and Online Activism in Four Countries From 2019 to 2023 Can AI Lie? Chabot Technologies, the Subject, and the Importance of Lying Improving the Quality of Individual-Level Web Tracking: Challenges of Existing Approaches and Introduction of a New Content and Long-Tail Sensitive Academic Solution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1