桥接定性数据孤岛:通过基于机器学习的交叉研究代码链接重用编码的潜力

IF 3 2区 社会学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Social Science Computer Review Pub Date : 2023-11-13 DOI:10.1177/08944393231215459
Sergej Wildemann, Claudia Niederée, Erick Elejalde
{"title":"桥接定性数据孤岛:通过基于机器学习的交叉研究代码链接重用编码的潜力","authors":"Sergej Wildemann, Claudia Niederée, Erick Elejalde","doi":"10.1177/08944393231215459","DOIUrl":null,"url":null,"abstract":"For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"11 2","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging Qualitative Data Silos: The Potential of Reusing Codings Through Machine Learning Based Cross-Study Code Linking\",\"authors\":\"Sergej Wildemann, Claudia Niederée, Erick Elejalde\",\"doi\":\"10.1177/08944393231215459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.\",\"PeriodicalId\":49509,\"journal\":{\"name\":\"Social Science Computer Review\",\"volume\":\"11 2\",\"pages\":\"0\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Science Computer Review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/08944393231215459\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/08944393231215459","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

在定性数据分析(QDA)中,研究人员为文本片段分配代码,以将信息排列成主题或概念。这些注释有助于信息检索和识别非结构化数据中出现的模式。然而,这些元数据通常不会在研究结束后发布或重用。后续的研究与类似的研究问题需要一个新的代码定义,并没有受益于其他分析师的经验。基于机器学习(ML)的分类基于这样的数据种子仍然是一个具有挑战性的任务,由于代码定义的模糊性和固有的主观性练习。以前使用ML支持QDA的尝试依赖于线性模型,并且只检查了较小或专门为此目的编码的单个数据集。然而,我们表明,现代方法有效地捕获至少部分代码的语义,并可以推广到多个研究。我们在三个大型真实数据集上分析了多个分类器的性能。此外,我们提出了一种基于机器学习的方法来识别不同研究中代码的语义关系,以显示主题面形,增强相关内容的检索,或引导编码过程。这些令人鼓舞的结果表明,分析师如何从先前的解释工作中受益,可能会对定性数据产生新的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Bridging Qualitative Data Silos: The Potential of Reusing Codings Through Machine Learning Based Cross-Study Code Linking
For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Social Science Computer Review
Social Science Computer Review 社会科学-计算机:跨学科应用
CiteScore
9.00
自引率
4.90%
发文量
95
审稿时长
>12 weeks
期刊介绍: Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.
期刊最新文献
Has ChatGPT Disrupted the Education Sector in the U.S.? The Moderating Role of Self-Esteem in the Relationship Between Social Media Use and Life Satisfaction Among Older Adults Feminist Identity and Online Activism in Four Countries From 2019 to 2023 Can AI Lie? Chabot Technologies, the Subject, and the Importance of Lying Improving the Quality of Individual-Level Web Tracking: Challenges of Existing Approaches and Introduction of a New Content and Long-Tail Sensitive Academic Solution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1