Matthias Hoffmann, Felipe G. Santos, Christina Neumayer, Dan Mercea
{"title":"Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis","authors":"Matthias Hoffmann, Felipe G. Santos, Christina Neumayer, Dan Mercea","doi":"10.1080/19312458.2022.2128099","DOIUrl":null,"url":null,"abstract":"ABSTRACT This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2022.2128099","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 4
Abstract
ABSTRACT This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value.
期刊介绍:
Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches.
Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches.
Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication.
In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.