António Correia, Dennis Paulino, H. Paredes, D. Guimaraes, D. Schneider, Benjamim Fonseca
{"title":"Investigating Author Research Relatedness through Crowdsourcing: A Replication Study on MTurk","authors":"António Correia, Dennis Paulino, H. Paredes, D. Guimaraes, D. Schneider, Benjamim Fonseca","doi":"10.1109/CSCWD57460.2023.10152707","DOIUrl":null,"url":null,"abstract":"Determining the relatedness of publications by detecting similarities and connections between researchers and their outputs can help science stakeholders worldwide to find areas of common interest and potential collaboration. To this end, many studies have tried to explore authorship attribution and research similarity detection through the use of automatic approaches. Nonetheless, inferring author research relatedness from imperfect data containing errors and multiple references to the same entities is a long-standing challenge. In a previous study, we conducted an experiment where a homogeneous crowd of volunteers contributed to a set of author name disambiguation tasks. The results demonstrated an overall accuracy higher than 75% and we also found important effects tied to the confidence level indicated by participants in correct answers. However, this study left many open questions regarding the comparative accuracy of a large heterogeneous crowd with monetary rewards involved. This paper seeks to address some of these unanswered questions by repeating the experiment with a crowd of 140 online paid workers recruited via MTurk’s microtask crowdsourcing platform. Our replication study shows high accuracy for name disambiguation tasks based on authorship-level information and content features. These findings can be of greater informative value since they also explore hints of crowd behavior activity in terms of time duration and mean proportion of clicks per worker with implications for interface and interaction design.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"26 3","pages":"77-82"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152707","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Determining the relatedness of publications by detecting similarities and connections between researchers and their outputs can help science stakeholders worldwide to find areas of common interest and potential collaboration. To this end, many studies have tried to explore authorship attribution and research similarity detection through the use of automatic approaches. Nonetheless, inferring author research relatedness from imperfect data containing errors and multiple references to the same entities is a long-standing challenge. In a previous study, we conducted an experiment where a homogeneous crowd of volunteers contributed to a set of author name disambiguation tasks. The results demonstrated an overall accuracy higher than 75% and we also found important effects tied to the confidence level indicated by participants in correct answers. However, this study left many open questions regarding the comparative accuracy of a large heterogeneous crowd with monetary rewards involved. This paper seeks to address some of these unanswered questions by repeating the experiment with a crowd of 140 online paid workers recruited via MTurk’s microtask crowdsourcing platform. Our replication study shows high accuracy for name disambiguation tasks based on authorship-level information and content features. These findings can be of greater informative value since they also explore hints of crowd behavior activity in terms of time duration and mean proportion of clicks per worker with implications for interface and interaction design.
期刊介绍:
Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW.
The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas.
The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.