Christina Viehmann, Tilman Beck, Marcus Maurer, Oliver Quiring, Iryna Gurevych
{"title":"调查数字媒体公共政策的意见:建立一个有监督的立场分类机器学习工具","authors":"Christina Viehmann, Tilman Beck, Marcus Maurer, Oliver Quiring, Iryna Gurevych","doi":"10.1080/19312458.2022.2151579","DOIUrl":null,"url":null,"abstract":"ABSTRACT Supervised machine learning (SML) provides us with tools to efficiently scrutinize large corpora of communication texts. Yet, setting up such a tool involves plenty of decisions starting with the data needed for training, the selection of an algorithm, and the details of model training. We aim at establishing a firm link between communication research tasks and the corresponding state-of-the-art in natural language processing research by systematically comparing the performance of different automatic text analysis approaches. We do this for a challenging task – stance detection of opinions on policy measures to tackle the COVID-19 pandemic in Germany voiced on Twitter. Our results add evidence that pre-trained language models such as BERT outperform feature-based and other neural network approaches. Yet, the gains one can achieve differ greatly depending on the specific merits of pre-training (i.e., use of different language models). Adding to the robustness of our conclusions, we run a generalizability check with a different use case in terms of language and topic. Additionally, we illustrate how the amount and quality of training data affect model performance pointing to potential compensation effects. Based on our results, we derive important practical recommendations for setting up such SML tools to study communication texts.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Investigating Opinions on Public Policies in Digital Media: Setting up a Supervised Machine Learning Tool for Stance Classification\",\"authors\":\"Christina Viehmann, Tilman Beck, Marcus Maurer, Oliver Quiring, Iryna Gurevych\",\"doi\":\"10.1080/19312458.2022.2151579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Supervised machine learning (SML) provides us with tools to efficiently scrutinize large corpora of communication texts. Yet, setting up such a tool involves plenty of decisions starting with the data needed for training, the selection of an algorithm, and the details of model training. We aim at establishing a firm link between communication research tasks and the corresponding state-of-the-art in natural language processing research by systematically comparing the performance of different automatic text analysis approaches. We do this for a challenging task – stance detection of opinions on policy measures to tackle the COVID-19 pandemic in Germany voiced on Twitter. Our results add evidence that pre-trained language models such as BERT outperform feature-based and other neural network approaches. Yet, the gains one can achieve differ greatly depending on the specific merits of pre-training (i.e., use of different language models). Adding to the robustness of our conclusions, we run a generalizability check with a different use case in terms of language and topic. Additionally, we illustrate how the amount and quality of training data affect model performance pointing to potential compensation effects. Based on our results, we derive important practical recommendations for setting up such SML tools to study communication texts.\",\"PeriodicalId\":47552,\"journal\":{\"name\":\"Communication Methods and Measures\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communication Methods and Measures\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1080/19312458.2022.2151579\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2022.2151579","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
Investigating Opinions on Public Policies in Digital Media: Setting up a Supervised Machine Learning Tool for Stance Classification
ABSTRACT Supervised machine learning (SML) provides us with tools to efficiently scrutinize large corpora of communication texts. Yet, setting up such a tool involves plenty of decisions starting with the data needed for training, the selection of an algorithm, and the details of model training. We aim at establishing a firm link between communication research tasks and the corresponding state-of-the-art in natural language processing research by systematically comparing the performance of different automatic text analysis approaches. We do this for a challenging task – stance detection of opinions on policy measures to tackle the COVID-19 pandemic in Germany voiced on Twitter. Our results add evidence that pre-trained language models such as BERT outperform feature-based and other neural network approaches. Yet, the gains one can achieve differ greatly depending on the specific merits of pre-training (i.e., use of different language models). Adding to the robustness of our conclusions, we run a generalizability check with a different use case in terms of language and topic. Additionally, we illustrate how the amount and quality of training data affect model performance pointing to potential compensation effects. Based on our results, we derive important practical recommendations for setting up such SML tools to study communication texts.
期刊介绍:
Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches.
Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches.
Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication.
In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.