桥接定性数据孤岛:通过基于机器学习的交叉研究代码链接重用编码的潜力

IF 3 2区社会学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Social Science Computer Review Pub Date : 2023-11-13 DOI:10.1177/08944393231215459

Sergej Wildemann, Claudia Niederée, Erick Elejalde

{"title":"桥接定性数据孤岛:通过基于机器学习的交叉研究代码链接重用编码的潜力","authors":"Sergej Wildemann, Claudia Niederée, Erick Elejalde","doi":"10.1177/08944393231215459","DOIUrl":null,"url":null,"abstract":"For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"11 2","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging Qualitative Data Silos: The Potential of Reusing Codings Through Machine Learning Based Cross-Study Code Linking\",\"authors\":\"Sergej Wildemann, Claudia Niederée, Erick Elejalde\",\"doi\":\"10.1177/08944393231215459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.\",\"PeriodicalId\":49509,\"journal\":{\"name\":\"Social Science Computer Review\",\"volume\":\"11 2\",\"pages\":\"0\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Science Computer Review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/08944393231215459\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/08944393231215459","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在定性数据分析(QDA)中，研究人员为文本片段分配代码，以将信息排列成主题或概念。这些注释有助于信息检索和识别非结构化数据中出现的模式。然而，这些元数据通常不会在研究结束后发布或重用。后续的研究与类似的研究问题需要一个新的代码定义，并没有受益于其他分析师的经验。基于机器学习(ML)的分类基于这样的数据种子仍然是一个具有挑战性的任务，由于代码定义的模糊性和固有的主观性练习。以前使用ML支持QDA的尝试依赖于线性模型，并且只检查了较小或专门为此目的编码的单个数据集。然而，我们表明，现代方法有效地捕获至少部分代码的语义，并可以推广到多个研究。我们在三个大型真实数据集上分析了多个分类器的性能。此外，我们提出了一种基于机器学习的方法来识别不同研究中代码的语义关系，以显示主题面形，增强相关内容的检索，或引导编码过程。这些令人鼓舞的结果表明，分析师如何从先前的解释工作中受益，可能会对定性数据产生新的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bridging Qualitative Data Silos: The Potential of Reusing Codings Through Machine Learning Based Cross-Study Code Linking

For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

相关文献

二甲双胍通过HDAC6和FoxO3a转录调控肌肉生长抑制素诱导肌肉萎缩

IF 8.9 1区医学Journal of Cachexia, Sarcopenia and MusclePub Date : 2021-11-02 DOI: 10.1002/jcsm.12833

Min Ju Kang, Ji Wook Moon, Jung Ok Lee, Ji Hae Kim, Eun Jeong Jung, Su Jin Kim, Joo Yeon Oh, Sang Woo Wu, Pu Reum Lee, Sun Hwa Park, Hyeon Soo Kim

具有疾病敏感单倍型的非亲属供体脐带血移植后的1型糖尿病

IF 3.2 3区医学Journal of Diabetes InvestigationPub Date : 2022-11-02 DOI: 10.1111/jdi.13939

Kensuke Matsumoto, Taisuke Matsuyama, Ritsu Sumiyoshi, Matsuo Takuji, Tadashi Yamamoto, Ryosuke Shirasaki, Haruko Tashiro

封面:蛋白质组学分析确定IRSp53和fastin是PRV输出和直接细胞-细胞传播的关键

IF 3.4 4区生物学ProteomicsPub Date : 2019-12-02 DOI: 10.1002/pmic.201970201

Fei-Long Yu, Huan Miao, Jinjin Xia, Fan Jia, Huadong Wang, Fuqiang Xu, Lin Guo

来源期刊

Social Science Computer Review 社会科学-计算机：跨学科应用

CiteScore

9.00

自引率

4.90%

发文量

审稿时长

>12 weeks

期刊介绍： Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.