推特情绪对衡量英国出生队列心理健康和幸福的纵向可靠性

Nina Di Cara, Oliver Di Davis, Claire Haworth
{"title":"推特情绪对衡量英国出生队列心理健康和幸福的纵向可靠性","authors":"Nina Di Cara, Oliver Di Davis, Claire Haworth","doi":"10.23889/ijpds.v8i3.2278","DOIUrl":null,"url":null,"abstract":"Introduction & BackgroundSocial media data is increasingly recognised as an important source of behavioural data. It can provide insights into patterns of life and how individuals and groups are feeling. However, many studies into social media’s relationship to mental health and well-being have suffered from poorly developed ground-truth data, which relies on assumed ground-truth labels and data from single timepoints. This means that the accuracy of models at future timepoints cannot be assessed.
 Collecting Twitter data from cohorts provides a solution to this issue, given the many years of high quality data that can be used as ground truth. Cohorts can also benefit from the higher-resolution data provided by social media that can supplement their traditional data collection methods.
 Objectives & ApproachWe used Twitter data that has been collected with consent from two generations of the Avon Longitudinal Study of Parents and Children (ALSPAC) (N=656). The data is linked to two surveys completed in April-May 2020 and May-July 2020 for validated outcome measures of anxiety, depression, and general well-being.
 Using the LIWC and VADER sentiment algorithms, the sentiment categories most highly associated with each outcome were used to develop a multiple regression model for each of anxiety, depression and general well-being using the first survey timepoint. Error from these models in predicting the second timepoint allowed us to assess how well different outcomes are predicted by demographic group.
 Relevance to Digital FootprintsDigital footprint data can complement traditional data sources to provide a more nuanced view of health inequalities. These data are typically less timely to collect than traditional data collection methods (census, survey) allowing a more reactive response to emergent issues such as the cost-of-living crisis.
 ResultsThis study illustrates how the collection of digital footprint data can be integrated into existing long-term studies which can be used to provide multiple points of ground-truth data.
 Conclusions & ImplicationsThis study has shown that the collection and integration of Twitter data into cohort studies is feasible, and that cohort data provides multiple ground-truth options. This time series data is important for assessing the potential feasibility of mental health inference from online behavioural data, which this study shows may vary across personal characteristics.
 In future research we plan to link subsequent surveys from ALSPAC to provide more ground truth time points and explore the temporal stability of predictions, and impacts of model drift on performance.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Longitudinal reliability of Twitter sentiment for measuring mental health and well-being in a UK birth cohort\",\"authors\":\"Nina Di Cara, Oliver Di Davis, Claire Haworth\",\"doi\":\"10.23889/ijpds.v8i3.2278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction & BackgroundSocial media data is increasingly recognised as an important source of behavioural data. It can provide insights into patterns of life and how individuals and groups are feeling. However, many studies into social media’s relationship to mental health and well-being have suffered from poorly developed ground-truth data, which relies on assumed ground-truth labels and data from single timepoints. This means that the accuracy of models at future timepoints cannot be assessed.
 Collecting Twitter data from cohorts provides a solution to this issue, given the many years of high quality data that can be used as ground truth. Cohorts can also benefit from the higher-resolution data provided by social media that can supplement their traditional data collection methods.
 Objectives & ApproachWe used Twitter data that has been collected with consent from two generations of the Avon Longitudinal Study of Parents and Children (ALSPAC) (N=656). The data is linked to two surveys completed in April-May 2020 and May-July 2020 for validated outcome measures of anxiety, depression, and general well-being.
 Using the LIWC and VADER sentiment algorithms, the sentiment categories most highly associated with each outcome were used to develop a multiple regression model for each of anxiety, depression and general well-being using the first survey timepoint. Error from these models in predicting the second timepoint allowed us to assess how well different outcomes are predicted by demographic group.
 Relevance to Digital FootprintsDigital footprint data can complement traditional data sources to provide a more nuanced view of health inequalities. These data are typically less timely to collect than traditional data collection methods (census, survey) allowing a more reactive response to emergent issues such as the cost-of-living crisis.
 ResultsThis study illustrates how the collection of digital footprint data can be integrated into existing long-term studies which can be used to provide multiple points of ground-truth data.
 Conclusions & ImplicationsThis study has shown that the collection and integration of Twitter data into cohort studies is feasible, and that cohort data provides multiple ground-truth options. This time series data is important for assessing the potential feasibility of mental health inference from online behavioural data, which this study shows may vary across personal characteristics.
 In future research we plan to link subsequent surveys from ALSPAC to provide more ground truth time points and explore the temporal stability of predictions, and impacts of model drift on performance.\",\"PeriodicalId\":132937,\"journal\":{\"name\":\"International Journal for Population Data Science\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal for Population Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23889/ijpds.v8i3.2278\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v8i3.2278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

介绍,社交媒体数据越来越被认为是行为数据的重要来源。它可以提供对生活模式的洞察,以及个人和群体的感受。然而,许多关于社交媒体与心理健康和幸福关系的研究都受到了基础事实数据不完善的影响,这些数据依赖于假设的基础事实标签和单一时间点的数据。这意味着模型在未来时间点的准确性无法评估。从群组中收集Twitter数据为这个问题提供了一个解决方案,因为多年的高质量数据可以作为基础事实。群体还可以从社交媒体提供的高分辨率数据中受益,这些数据可以补充他们传统的数据收集方法。 目标,我们使用了雅芳父母与儿童纵向研究(ALSPAC)中两代人(N=656)在征得同意的情况下收集的Twitter数据。这些数据与2020年4月至5月和2020年5月至7月完成的两项调查有关,这些调查旨在验证焦虑、抑郁和总体幸福感的结果测量。使用LIWC和VADER情绪算法,使用与每个结果相关度最高的情绪类别,在第一个调查时间点为焦虑、抑郁和一般幸福感建立多元回归模型。这些模型在预测第二个时间点时的误差使我们能够评估不同人口群体对不同结果的预测程度。与数字足迹的相关性数字足迹数据可以补充传统的数据来源,对卫生不平等现象提供更细致入微的看法。与传统的数据收集方法(人口普查、调查)相比,这些数据的收集通常不及时,因此可以对诸如生活成本危机等紧急问题做出更被动的反应。结果:本研究说明了如何将数字足迹数据的收集整合到现有的长期研究中,从而提供多点地面真值数据。 结论,本研究表明,将Twitter数据收集和整合到队列研究中是可行的,并且队列数据提供了多个基本事实选项。该时间序列数据对于评估从在线行为数据推断心理健康的潜在可行性非常重要,该研究表明,在线行为数据可能因个人特征而异。在未来的研究中,我们计划将ALSPAC的后续调查联系起来,以提供更多的地面真实时间点,并探索预测的时间稳定性,以及模型漂移对性能的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Longitudinal reliability of Twitter sentiment for measuring mental health and well-being in a UK birth cohort
Introduction & BackgroundSocial media data is increasingly recognised as an important source of behavioural data. It can provide insights into patterns of life and how individuals and groups are feeling. However, many studies into social media’s relationship to mental health and well-being have suffered from poorly developed ground-truth data, which relies on assumed ground-truth labels and data from single timepoints. This means that the accuracy of models at future timepoints cannot be assessed. Collecting Twitter data from cohorts provides a solution to this issue, given the many years of high quality data that can be used as ground truth. Cohorts can also benefit from the higher-resolution data provided by social media that can supplement their traditional data collection methods. Objectives & ApproachWe used Twitter data that has been collected with consent from two generations of the Avon Longitudinal Study of Parents and Children (ALSPAC) (N=656). The data is linked to two surveys completed in April-May 2020 and May-July 2020 for validated outcome measures of anxiety, depression, and general well-being. Using the LIWC and VADER sentiment algorithms, the sentiment categories most highly associated with each outcome were used to develop a multiple regression model for each of anxiety, depression and general well-being using the first survey timepoint. Error from these models in predicting the second timepoint allowed us to assess how well different outcomes are predicted by demographic group. Relevance to Digital FootprintsDigital footprint data can complement traditional data sources to provide a more nuanced view of health inequalities. These data are typically less timely to collect than traditional data collection methods (census, survey) allowing a more reactive response to emergent issues such as the cost-of-living crisis. ResultsThis study illustrates how the collection of digital footprint data can be integrated into existing long-term studies which can be used to provide multiple points of ground-truth data. Conclusions & ImplicationsThis study has shown that the collection and integration of Twitter data into cohort studies is feasible, and that cohort data provides multiple ground-truth options. This time series data is important for assessing the potential feasibility of mental health inference from online behavioural data, which this study shows may vary across personal characteristics. In future research we plan to link subsequent surveys from ALSPAC to provide more ground truth time points and explore the temporal stability of predictions, and impacts of model drift on performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using novel data linkage of biobank data with administrative health data to inform genomic analysis for future precision medicine treatment of congenital heart disease Common governance model: a way to avoid data segregation between existing trusted research environment Federated learning for generating synthetic data: a scoping review Health Data Governance for Research Use in Alberta Establishment of a birth-to-education cohort of 1 million Palestinian refugees using electronic medical records and electronic education records
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1