A Quadrilogy for (Big) Data Reliabilities

IF 6.3 1区 文学 Q1 COMMUNICATION Communication Methods and Measures Pub Date : 2021-07-03 DOI:10.1080/19312458.2020.1861592
K. Krippendorff
{"title":"A Quadrilogy for (Big) Data Reliabilities","authors":"K. Krippendorff","doi":"10.1080/19312458.2020.1861592","DOIUrl":null,"url":null,"abstract":"ABSTRACT This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data, applicable quite generally. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data – a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level. It ends with a discussion of how to select reliability benchmarks appropriate for the quadrilogy of agreement measures.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":"15 1","pages":"165 - 189"},"PeriodicalIF":6.3000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/19312458.2020.1861592","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2020.1861592","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data, applicable quite generally. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data – a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level. It ends with a discussion of how to select reliability benchmarks appropriate for the quadrilogy of agreement measures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
(大)数据可靠性的四边形
本文针对检验真正大数据可靠性的挑战,提出了一种具有广泛适用性的四种数据可靠性度量方法。这些措施源于这样一种认识,即大众编码数据挑战了大数据科学家的信念,即面对庞大的数据量,数据的社会背景和意义变得无关紧要。大还挑战了可用的编码间协议系数和可用的软件,它们要么在接受的数据形式方面过于限制,要么在数据变得非常大时超出了计算限制。在将Krippendorff的alpha用于非常大的数据的过程中,出现了将可靠性概念分为四种不同类型的可能性,以服务于社会研究中不同的方法目标。他们分别评估生成数据过程的可复制性、生成数据的准确性、所提出的理论、编码员、公式或算法作为人类编码员的替代品,以及几种人类判断之间的决定性。它们的数学关系保证了可比性。本文首先对二进制数据发展了这个四边形的一致性度量,提供了一个计算它的软件链接,然后将其扩展到标称数据-这是进一步推广的第一步。它还提出了一种计算路径来估计每一种测量的置信限,以及当有可能低于可容忍水平时接受数据为可靠的概率。最后讨论了如何选择适用于协议度量四方阵的可靠性基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
21.10
自引率
1.80%
发文量
9
期刊介绍: Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches. Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches. Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication. In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.
期刊最新文献
JST and rJST: joint estimation of sentiment and topics in textual data using a semi-supervised approach Using State Space Grids to Quantify and Examine Dynamics of Dyadic Conversation Bootstrapping public entities. Domain-specific NER for public speakers On Measurement Validity and Language Models: Increasing Validity and Decreasing Bias with Instructions Googling Politics? Comparing Five Computational Methods to Identify Political and News-related Searches from Web Browser Histories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1