{"title":"Translating surveys to surveillance on social media: methodological challenges & solutions","authors":"Chao Yang, P. Srinivasan","doi":"10.1145/2615569.2615696","DOIUrl":null,"url":null,"abstract":"Passive surveillance of preferences, opinions and behaviors on social media is becoming increasingly common. The general goal is to make inferences from observations collected from the numerous posts publicly available in blogs, microblogs, and other social forums. A traditional approach for collecting observations is by querying a random (or convenience) sample of individuals with surveys. A wide variety of well respected survey instruments have been developed over many decades especially in social sciences.The question addressed here is: how does one `translate' a survey of interest into surveillance strategies on social media? Specifically, how does one find the posts that could be interpreted as valid responses to the survey? Developing a general methodology for translating a survey into social medial surveillance might further the inclusion of social media research into traditional social science research. We propose a translation methodology using a well-reputed survey (the Satisfaction with Life Scale) as an example. A second methodological contribution that goes beyond the survey translation focus is a crowdsourcing approach, which we claim with reasonable confidence, finds close to \\ul{all} the relevant items in a dataset. This is different from the standard approach of asking workers to annotate all items in a small dataset. Our method supports more accurate evaluations (i.e., more precise recall calculations) as well as the development of larger training datasets. Finally the resulting surveillance method derived from the life satisfaction survey achieves recall, precision and F scores between 0.59 and 0.65. This is considerably better than standard methods using lexicons (precision around 0.16) or classifiers (precision, recall and F scores between 0.32 and 0.38).","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"32 1","pages":"4-12"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2615569.2615696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Passive surveillance of preferences, opinions and behaviors on social media is becoming increasingly common. The general goal is to make inferences from observations collected from the numerous posts publicly available in blogs, microblogs, and other social forums. A traditional approach for collecting observations is by querying a random (or convenience) sample of individuals with surveys. A wide variety of well respected survey instruments have been developed over many decades especially in social sciences.The question addressed here is: how does one `translate' a survey of interest into surveillance strategies on social media? Specifically, how does one find the posts that could be interpreted as valid responses to the survey? Developing a general methodology for translating a survey into social medial surveillance might further the inclusion of social media research into traditional social science research. We propose a translation methodology using a well-reputed survey (the Satisfaction with Life Scale) as an example. A second methodological contribution that goes beyond the survey translation focus is a crowdsourcing approach, which we claim with reasonable confidence, finds close to \ul{all} the relevant items in a dataset. This is different from the standard approach of asking workers to annotate all items in a small dataset. Our method supports more accurate evaluations (i.e., more precise recall calculations) as well as the development of larger training datasets. Finally the resulting surveillance method derived from the life satisfaction survey achieves recall, precision and F scores between 0.59 and 0.65. This is considerably better than standard methods using lexicons (precision around 0.16) or classifiers (precision, recall and F scores between 0.32 and 0.38).