私营公司的大规模数据可靠吗?对流行数据集中机器生成的商业位置数据的分析

IF 3 2区 社会学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Social Science Computer Review Pub Date : 2024-04-15 DOI:10.1177/08944393241245390
Nikolitsa Grigoropoulou, Mario L. Small
{"title":"私营公司的大规模数据可靠吗?对流行数据集中机器生成的商业位置数据的分析","authors":"Nikolitsa Grigoropoulou, Mario L. Small","doi":"10.1177/08944393241245390","DOIUrl":null,"url":null,"abstract":"Large-scale data from private companies offer new opportunities to examine topics of scientific and social significance, such as racial inequality, partisan polarization, and activity-based segregation. However, because such data are often generated through automated processes, their accuracy and reliability for social science research remain unclear. The present study examines how quality issues in large-scale data from private companies can afflict the reporting of even ostensibly uncomplicated values. We assess the reliability with which an often-used device tracking data source, SafeGraph, sorted data it acquired on financial institutions into categories, such as banks and payday lenders, based on a standard classification system. We find major classification problems that vary by type of institution, and remarkably high rates of unidentified closures and duplicate records. We suggest that classification problems can affect research based on large-scale private data in four ways: detection, efficiency, validity, and bias. We discuss the implications of our findings, and list a set of problems researchers should consider when using large-scale data from companies.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are Large-Scale Data From Private Companies Reliable? An Analysis of Machine-Generated Business Location Data in a Popular Dataset\",\"authors\":\"Nikolitsa Grigoropoulou, Mario L. Small\",\"doi\":\"10.1177/08944393241245390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale data from private companies offer new opportunities to examine topics of scientific and social significance, such as racial inequality, partisan polarization, and activity-based segregation. However, because such data are often generated through automated processes, their accuracy and reliability for social science research remain unclear. The present study examines how quality issues in large-scale data from private companies can afflict the reporting of even ostensibly uncomplicated values. We assess the reliability with which an often-used device tracking data source, SafeGraph, sorted data it acquired on financial institutions into categories, such as banks and payday lenders, based on a standard classification system. We find major classification problems that vary by type of institution, and remarkably high rates of unidentified closures and duplicate records. We suggest that classification problems can affect research based on large-scale private data in four ways: detection, efficiency, validity, and bias. We discuss the implications of our findings, and list a set of problems researchers should consider when using large-scale data from companies.\",\"PeriodicalId\":49509,\"journal\":{\"name\":\"Social Science Computer Review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Science Computer Review\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/08944393241245390\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393241245390","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

来自私营公司的大规模数据为研究种族不平等、党派两极分化和基于活动的隔离等具有科学和社会意义的课题提供了新的机会。然而,由于这些数据通常是通过自动化流程生成的,因此它们在社会科学研究中的准确性和可靠性仍不明确。本研究探讨了私营公司大规模数据的质量问题会如何影响表面上并不复杂的数值报告。我们评估了经常使用的设备追踪数据源 SafeGraph 根据标准分类系统将其获得的金融机构数据分类(如银行和发薪日贷款机构)的可靠性。我们发现,不同类型的机构存在不同的重大分类问题,未识别的关闭机构和重复记录的比率也非常高。我们认为,分类问题会从四个方面影响基于大规模私人数据的研究:检测、效率、有效性和偏见。我们讨论了研究结果的影响,并列出了研究人员在使用公司大规模数据时应考虑的一系列问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Are Large-Scale Data From Private Companies Reliable? An Analysis of Machine-Generated Business Location Data in a Popular Dataset
Large-scale data from private companies offer new opportunities to examine topics of scientific and social significance, such as racial inequality, partisan polarization, and activity-based segregation. However, because such data are often generated through automated processes, their accuracy and reliability for social science research remain unclear. The present study examines how quality issues in large-scale data from private companies can afflict the reporting of even ostensibly uncomplicated values. We assess the reliability with which an often-used device tracking data source, SafeGraph, sorted data it acquired on financial institutions into categories, such as banks and payday lenders, based on a standard classification system. We find major classification problems that vary by type of institution, and remarkably high rates of unidentified closures and duplicate records. We suggest that classification problems can affect research based on large-scale private data in four ways: detection, efficiency, validity, and bias. We discuss the implications of our findings, and list a set of problems researchers should consider when using large-scale data from companies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Social Science Computer Review
Social Science Computer Review 社会科学-计算机:跨学科应用
CiteScore
9.00
自引率
4.90%
发文量
95
审稿时长
>12 weeks
期刊介绍: Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.
期刊最新文献
Can AI Lie? Chabot Technologies, the Subject, and the Importance of Lying Improving the Quality of Individual-Level Web Tracking: Challenges of Existing Approaches and Introduction of a New Content and Long-Tail Sensitive Academic Solution Using Google Trends Data to Study High-Frequency Search Terms: Evidence for a Reliability-Frequency Continuum Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages Status Spill-Over in Cryptomarket for Illegal Goods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1